SparkLGBAdvancedPipeline

class sparklightautoml.pipelines.features.lgb_pipeline.SparkLGBAdvancedPipeline(feats_imp=None, top_intersections=5, max_intersection_depth=3, subsample=None, multiclass_te_co=3, auto_unique_co=10, output_categories=False, **kwargs)[source]

Bases: SparkFeaturesPipeline, SparkTabularDataFeatures

Create advanced pipeline for trees based models.

Includes:

Different cats and numbers handling according to role params.

Dates handling - extracting seasons and create datediffs.

Create categorical intersections.

__init__(feats_imp=None, top_intersections=5, max_intersection_depth=3, subsample=None, multiclass_te_co=3, auto_unique_co=10, output_categories=False, **kwargs)[source]

Parameters:

feats_imp (Optional[ImportanceEstimator]) – Features importances mapping.
top_intersections (int) – Max number of categories to generate intersections.
max_intersection_depth (int) – Max depth of cat intersection.
subsample (Union[int, float, None]) – Subsample to calc data statistics.
multiclass_te_co (int) – Cutoff if use target encoding in cat handling on multiclass task if number of classes is high.
auto_unique_co (int) – Switch to target encoding if high cardinality.

create_pipeline(train)[source]

Create tree pipeline.

Parameters:: train (SparkDataset) – Dataset with train features.
Return type:: Union[SparkBaseEstimator, SparkBaseTransformer, SparkUnionTransformer, SparkSequentialTransformer]
Returns:: Transformer.