SparkFeaturesPipeline

class sparklightautoml.pipelines.features.base.SparkFeaturesPipeline(**kwargs)[source]

Bases: FeaturesPipeline, TransformerInputOutputRoles

Abstract class.

Analyze train dataset and create composite transformer based on subset of features. Instance can be interpreted like Transformer (look for LAMLTransformer) with delayed initialization (based on dataset metadata) Main method, user should define in custom pipeline is .create_pipeline. For example, look at LGBSimpleFeatures. After FeaturePipeline instance is created, it is used like transformer with .fit_transform and .transform method.

create_pipeline(train)[source]

Analyse dataset and create composite transformer.

Parameters:

train (SparkDataset) – Dataset with train data.

Return type:

Union[SparkBaseEstimator, SparkBaseTransformer, SparkUnionTransformer, SparkSequentialTransformer]

Returns:

Composite transformer (pipeline).

fit_transform(train)[source]

Create pipeline and then fit on train data and then transform.

Parameters:

train (SparkDataset) – Dataset with train data.n

Return type:

SparkDataset

Returns:

Dataset with new features.