Preproc
class ColumnPreproc
Preprocessing instructions for a column.
Arguments:
special_values
- A sequence of special values to handle during preprocessing.impute_nan
- A flag indicating whether to impute NaN values during preprocessing. If True, NaN values will not be sampled during the generation of synthetic data.non_sample_values
- A sequence of values that should not be sampled during the generation of synthetic data.protection
- AProtection
object or boolean flag indicating whether to apply protection to the column. If boolean, the default protection is applied, otherwise theProtection
object configures the protection.
class TabularPreproc
TabularPreproc.from_schema
Build a preprocessor for tabular data from the Schema
.
Arguments:
schema
- ASchema
object.preprocessors
- A dictionary containing preprocessing instructions for each column in the schema. Keys are table names, values are dictionaries with column names as keys and preprocessing instructions as values. Preprocessing instructions can be instances ofColumnPreproc
, a column preprocessor, or None. If None the default preprocessor will be instantiated based on theColumn
type defined in theSchema
.
Returns:
A TabularPreproc
object.
TabularPreproc.fit
Fit the preprocessor to the given RelationalData
.
Arguments:
data
- TheRelationalData
to fit the preprocessor to.
Returns:
The fitted TabularPreproc
object.
class TextPreproc
TextPreproc.from_schema_table
Build a preprocessor for the text columns of a table from the Schema
.
Arguments:
schema
- ASchema
object.table
- Name of the target table in the schema that contains text columns.
Returns:
A TextPreproc
object.
TextPreproc.from_tabular
Build a preprocessor for the text columns of a table from the TabularPreproc
used for the tabular data.
Arguments:
preproc
- ATabularPreproc
object used for the tabular part of the data.table
- Name of the target table in the schema that contains text columns.
Returns:
A TextPreproc
object.
TextPreproc.fit
Fit the preprocessor to the given RelationalData
.
Arguments:
data
- TheRelationalData
to fit the preprocessor to.
Returns:
The fitted TextPreproc
object.