SparkMLPipeline

class sparklightautoml.pipelines.ml.base.SparkMLPipeline(ml_algos, force_calc=True, pre_selection=None, features_pipeline=None, post_selection=None, name=None, persist_before_ml_algo=False, computations_settings=None)[source]

Bases: MLPipeline, TransformerInputOutputRoles

Spark version of MLPipeline. Single ML pipeline.

Merge together stage of building ML model (every step, excluding model training, is optional):

  • Pre selection: select features from input data. Performed by SelectionPipeline.

  • Features generation: build new features from selected. Performed by SparkFeaturesPipeline.

  • Post selection: One more selection step - from created features. Performed by SelectionPipeline.

  • Hyperparams optimization for one or multiple ML models. Performed by ParamsTuner.

  • Train one or multiple ML models: Performed by SparkTabularMLAlgo. This step is the only required for at least 1 model.

fit_predict(train_valid)[source]

Fit on train/valid iterator and transform on validation part.

Parameters:

train_valid (SparkBaseTrainValidIterator) – Dataset iterator.

Return type:

SparkDataset

Returns:

Dataset with predictions of all models.

predict(dataset)[source]

Predict on new dataset.

Parameters:

dataset (SparkDataset) – Dataset used for prediction.

Return type:

SparkDataset

Returns:

Dataset with predictions of all trained models.