SparkMLPipeline
- class sparklightautoml.pipelines.ml.base.SparkMLPipeline(ml_algos, force_calc=True, pre_selection=None, features_pipeline=None, post_selection=None, name=None, persist_before_ml_algo=False, computations_settings=None)[source]
Bases:
MLPipeline
,TransformerInputOutputRoles
Spark version of
MLPipeline
. Single ML pipeline.Merge together stage of building ML model (every step, excluding model training, is optional):
Pre selection: select features from input data. Performed by
SelectionPipeline
.Features generation: build new features from selected. Performed by
SparkFeaturesPipeline
.Post selection: One more selection step - from created features. Performed by
SelectionPipeline
.Hyperparams optimization for one or multiple ML models. Performed by
ParamsTuner
.Train one or multiple ML models: Performed by
SparkTabularMLAlgo
. This step is the only required for at least 1 model.
- fit_predict(train_valid)[source]
Fit on train/valid iterator and transform on validation part.
- Parameters:
train_valid (
SparkBaseTrainValidIterator
) – Dataset iterator.- Return type:
- Returns:
Dataset with predictions of all models.
- predict(dataset)[source]
Predict on new dataset.
- Parameters:
dataset (
SparkDataset
) – Dataset used for prediction.- Return type:
- Returns:
Dataset with predictions of all trained models.