SparkTabularDataFeatures
- class sparklightautoml.pipelines.features.base.SparkTabularDataFeatures(**kwargs)[source]
Bases:
object
Helper class contains basic features transformations for tabular data.
This method can de shared by all tabular feature pipelines, to simplify
.create_automl
definition.- __init__(**kwargs)[source]
Set default parameters for tabular pipeline constructor.
- Parameters:
**kwargs (
Any
) – Additional parameters.
- get_datetime_diffs(train)[source]
Difference for all datetimes with base date.
- Parameters:
train (
SparkDataset
) – Dataset with train data.- Return type:
- Returns:
Transformer or
None
if no required features.
- get_datetime_seasons(train, outp_role=None)[source]
Get season params from dates.
- Parameters:
train (
SparkDataset
) – Dataset with train data.outp_role (
Optional
[ColumnRole
]) – Role associated with output features.
- Return type:
- Returns:
Transformer or
None
if no required features.
- get_numeric_data(train, feats_to_select=None, prob=None)[source]
Select numeric features.
- Parameters:
- Return type:
- Returns:
Transformer.
- get_freq_encoding(train, feats_to_select=None)[source]
Get frequency encoding part.
- Parameters:
train (
SparkDataset
) – Dataset with train data.feats_to_select (
Optional
[List
[str
]]) – Features to handle. IfNone
- default filter.
- Return type:
- Returns:
Transformer.
- get_ordinal_encoding(train, feats_to_select=None)[source]
Get order encoded part.
- Parameters:
train (
SparkDataset
) – Dataset with train data.feats_to_select (
Optional
[List
[str
]]) – Features to handle. IfNone
- default filter.
- Return type:
- Returns:
Transformer.
- get_categorical_raw(train, feats_to_select=None)[source]
Get label encoded categories data.
- Parameters:
train (
SparkDataset
) – Dataset with train data.feats_to_select (
Optional
[List
[str
]]) – Features to handle. IfNone
- default filter.
- Return type:
- Returns:
Transformer.
- get_target_encoder(train)[source]
Get target encoder func for dataset.
- Parameters:
train (
SparkDataset
) – Dataset with train data.- Return type:
- Returns:
Class
- get_binned_data(train, feats_to_select=None)[source]
Get encoded quantiles of numeric features.
- Parameters:
train (
SparkDataset
) – Dataset with train data.feats_to_select (
Optional
[List
[str
]]) – features to hanlde. IfNone
- default filter.
- Return type:
- Returns:
Transformer.
- get_categorical_intersections(train, feats_to_select=None)[source]
Get transformer that implements categorical intersections.
- Parameters:
train (
SparkDataset
) – Dataset with train data.feats_to_select (
Optional
[List
[str
]]) – features to handle. IfNone
- default filter.
- Return type:
- Returns:
Transformer.
- get_uniques_cnt(train, feats)[source]
Get unique values cnt.
Be aware that this function uses approx_count_distinct and thus cannot return precise results
- Parameters:
train (
SparkDataset
) – Dataset with train data.
- Return type:
- Returns:
Series.
- get_top_categories(train, top_n=5)[source]
Get top categories by importance.
If feature importance is not defined, or feats has same importance - sort it by unique values counts. In second case init param
ascending_by_cardinality
defines how - asc or desc.- Parameters:
train (
SparkDataset
) – Dataset with train data.top_n (
int
) – Number of top categories.
- Return type:
- Returns:
List.