Preproc

class ColumnPreproc

Preprocessing instructions for a column.

Arguments:

special_values - A sequence of special values to handle during preprocessing.
impute_nan - A flag indicating whether to impute NaN values during preprocessing. If True, NaN values will not be sampled during the generation of synthetic data.
non_sample_values - A sequence of values that should not be sampled during the generation of synthetic data.
protection - A Protection object or boolean flag indicating whether to apply protection to the column. If boolean, the default protection is applied, otherwise the Protection object configures the protection.

class TabularPreproc

TabularPreproc.from_schema

@classmethod
def from_schema(
    cls: type['TabularPreproc'],
    schema: Schema,
    preprocessors: dict[str, dict[str, ColumnPreproc | Column | None]]
    | None = None
) -> 'TabularPreproc'

Build a preprocessor for tabular data from the Schema.

Arguments:

schema - A Schema object.
preprocessors - A dictionary containing preprocessing instructions for each column in the schema. Keys are table names, values are dictionaries with column names as keys and preprocessing instructions as values. Preprocessing instructions can be instances of ColumnPreproc, a column preprocessor, or None. If None the default preprocessor will be instantiated based on the Column type defined in the Schema.

Returns:

A TabularPreproc object.

TabularPreproc.fit

def fit(data: RelationalData) -> 'TabularPreproc'

Fit the preprocessor to the given RelationalData.

Arguments:

data - The RelationalData to fit the preprocessor to.

Returns:

The fitted TabularPreproc object.

class TextPreproc

TextPreproc.from_schema_table

@classmethod
def from_schema_table(cls: type['TextPreproc'], schema: Schema,
                      table: str) -> 'TextPreproc'

Build a preprocessor for the text columns of a table from the Schema.

Arguments:

schema - A Schema object.
table - Name of the target table in the schema that contains text columns.

Returns:

A TextPreproc object.

TextPreproc.from_tabular

@classmethod
def from_tabular(cls: type['TextPreproc'], preproc: TabularPreproc[_AP],
                 table: str) -> 'TextPreproc'

Build a preprocessor for the text columns of a table from the TabularPreproc used for the tabular data.

Arguments:

preproc - A TabularPreproc object used for the tabular part of the data.
table - Name of the target table in the schema that contains text columns.

Returns:

A TextPreproc object.

TextPreproc.fit

def fit(data: RelationalData) -> 'TextPreproc'

Fit the preprocessor to the given RelationalData.

Arguments:

data - The RelationalData to fit the preprocessor to.

Returns:

The fitted TextPreproc object.