SparkLGBAdvancedPipeline
- class sparklightautoml.pipelines.features.lgb_pipeline.SparkLGBAdvancedPipeline(feats_imp=None, top_intersections=5, max_intersection_depth=3, subsample=None, multiclass_te_co=3, auto_unique_co=10, output_categories=False, **kwargs)[source]
Bases:
SparkFeaturesPipeline
,SparkTabularDataFeatures
Create advanced pipeline for trees based models.
Includes:
Different cats and numbers handling according to role params.
Dates handling - extracting seasons and create datediffs.
Create categorical intersections.
- __init__(feats_imp=None, top_intersections=5, max_intersection_depth=3, subsample=None, multiclass_te_co=3, auto_unique_co=10, output_categories=False, **kwargs)[source]
- Parameters:
feats_imp (
Optional
[ImportanceEstimator
]) – Features importances mapping.top_intersections (
int
) – Max number of categories to generate intersections.max_intersection_depth (
int
) – Max depth of cat intersection.subsample (
Union
[int
,float
,None
]) – Subsample to calc data statistics.multiclass_te_co (
int
) – Cutoff if use target encoding in cat handling on multiclass task if number of classes is high.auto_unique_co (
int
) – Switch to target encoding if high cardinality.
- create_pipeline(train)[source]
Create tree pipeline.
- Parameters:
train (
SparkDataset
) – Dataset with train features.- Return type:
Union
[SparkBaseEstimator
,SparkBaseTransformer
,SparkUnionTransformer
,SparkSequentialTransformer
]- Returns:
Transformer.