Preproc
class Categorical
Categorical.__init__
Categorical column preprocessor treating categories as ordinal values.
Arguments:
base
- The base in which to represent the ordinal values associated to the categories.special_values
- A sequence of values to be handled separately as categories.impute_nan
- Whether to impute NaN values. If True, NaN values are replaced with other plausible values.non_sample_values
- A sequence of values that should not be sampled.protection
- AProtection
object.
class Coordinates
Coordinates.__init__
Coordinate column preprocessor.
Arguments:
base
- Base in which to represent the coordinate values.digits
- Number of digits to keep in the coordinate values.special_values
- A sequence of values to be handled separately as categories.impute_nan
- Whether to impute NaN values. If True, NaN values are replaced with other plausible values.non_sample_values
- A sequence of values that should not be sampled.protection
- AProtection
object.
class Date
Date.__init__
Date column preprocessor respecting a weekly periodicity.
Arguments:
fmt
- Datetime format. If None, it will be automatically inferred.special_values
- A sequence of values to be handled separately as categories.impute_nan
- Whether to impute NaN values. If True, NaN values are replaced with other plausible values.non_sample_values
- A sequence of values that should not be sampled.protection
- AProtection
object.
class Time
Time.__init__
Time column preprocessor.
Arguments:
fmt
- Timetime format. If None, it will be automatically inferred.special_values
- A sequence of values to be handled separately as categories.impute_nan
- Whether to impute NaN values. If True, NaN values are replaced with other plausible values.non_sample_values
- A sequence of values that should not be sampled.protection
- AProtection
object.
class Datetime
Datetime.__init__
Datetime column preprocessor.
Arguments:
date
- ADate
preprocessor or None. If None, the defaultDate
object is used.time
- ATime
preprocessor or None. If None, the defaultTime
object is used.fmt
- Datetime format. If None, it will be automatically inferred.special_values
- A sequence of values to be handled separately as categories.impute_nan
- Whether to impute NaN values. If True, NaN values are replaced with other plausible values.non_sample_values
- A sequence of values that should not be sampled.protection
- AProtection
object.
class Integer
Integer.__init__
Integer column preprocessor.
Arguments:
base
- Base in which to represent the integer values.special_values
- A sequence of values to be handled separately as categories.impute_nan
- Whether to impute NaN values. If True, NaN values are replaced with other plausible values.non_sample_values
- A sequence of values that should not be sampled.protection
- AProtection
object.
class Numeric
Numeric.__init__
Numeric column preprocessor.
Arguments:
base
- The base of the numeric system.max_digits
- Number of digits to keep in the numeric values.special_values
- A sequence of values to be handled separately as categories.impute_nan
- Whether to impute NaN values. If True, NaN values are replaced with other plausible values.non_sample_values
- A sequence of values that should not be sampled.protection
- AProtection
object.
class ColumnPreproc
Preprocessing instructions for a column.
Arguments:
special_values
- A sequence of special values to handle during preprocessing.impute_nan
- A flag indicating whether to impute NaN values during preprocessing. If True, NaN values will not be sampled during the generation of synthetic data.non_sample_values
- A sequence of values that should not be sampled during the generation of synthetic data.protection
- AProtection
object or boolean flag indicating whether to apply protection to the column. If boolean, the default protection is applied, otherwise theProtection
object configures the protection.
class TabularPreproc
TabularPreproc.__init__
Preprocessor for tabular data.
Arguments:
schema
- ASchema
object.preprocessors
- A dictionary containing preprocessing instructions for each column in the schema. Keys are table names, values are dictionaries with column names as keys and preprocessing instructions as values. Preprocessing instructions can be instances ofColumnPreproc
, a column preprocessor, or None. If None the default preprocessor will be instantiated based on theColumn
type defined in theSchema
.
class TextPreproc
TextPreproc.__init__
Preprocessor for text columns of a table which is part of a relational structure.
Arguments:
schema
- ASchema
object.table
- Name of the target table in the schema that contains text columns.