Skip to content

Preproc

class Categorical

Categorical.__init__

def __init__(
base: int = 1024,
special_values: Sequence = (),
impute_nan: bool = False,
non_sample_values: Sequence = (),
protection: Protection = Protection()
) -> None

Categorical column preprocessor treating categories as ordinal values.

Arguments:

  • base - The base in which to represent the ordinal values associated to the categories.
  • special_values - A sequence of values to be handled separately as categories.
  • impute_nan - Whether to impute NaN values. If True, NaN values are replaced with other plausible values.
  • non_sample_values - A sequence of values that should not be sampled.
  • protection - A Protection object.

class Coordinates

Coordinates.__init__

def __init__(
base: int = 10,
digits: int = 10,
special_values: Sequence = (),
impute_nan: bool = False,
non_sample_values: Sequence = (),
protection: Protection = Protection()
) -> None

Coordinate column preprocessor.

Arguments:

  • base - Base in which to represent the coordinate values.
  • digits - Number of digits to keep in the coordinate values.
  • special_values - A sequence of values to be handled separately as categories.
  • impute_nan - Whether to impute NaN values. If True, NaN values are replaced with other plausible values.
  • non_sample_values - A sequence of values that should not be sampled.
  • protection - A Protection object.

class Date

Date.__init__

def __init__(
fmt: str | None = None,
special_values: Sequence = (),
impute_nan: bool = False,
non_sample_values: Sequence = (),
protection: Protection = Protection()
) -> None

Date column preprocessor respecting a weekly periodicity.

Arguments:

  • fmt - Datetime format. If None, it will be automatically inferred.
  • special_values - A sequence of values to be handled separately as categories.
  • impute_nan - Whether to impute NaN values. If True, NaN values are replaced with other plausible values.
  • non_sample_values - A sequence of values that should not be sampled.
  • protection - A Protection object.

class Time

Time.__init__

def __init__(
fmt: str | None = None,
special_values: Sequence = (),
impute_nan: bool = False,
non_sample_values: Sequence = (),
protection: Protection = Protection()
) -> None

Time column preprocessor.

Arguments:

  • fmt - Datetime format. If None, it will be automatically inferred.
  • special_values - A sequence of values to be handled separately as categories.
  • impute_nan - Whether to impute NaN values. If True, NaN values are replaced with other plausible values.
  • non_sample_values - A sequence of values that should not be sampled.
  • protection - A Protection object.

class Datetime

Datetime.__init__

def __init__(
date: Date | None = None,
time: Time = None,
fmt: str | None = None,
special_values: Sequence = (),
impute_nan: bool = False,
non_sample_values: Sequence = (),
protection: Protection = Protection()
) -> None

Datetime column preprocessor.

Arguments:

  • date - A Date preprocessor or None. If None, the default Date object is used.
  • time - A Time preprocessor or None. If None, the default Time object is used.
  • fmt - Datetime format. If None, it will be automatically inferred.
  • special_values - A sequence of values to be handled separately as categories.
  • impute_nan - Whether to impute NaN values. If True, NaN values are replaced with other plausible values.
  • non_sample_values - A sequence of values that should not be sampled.
  • protection - A Protection object.

class Integer

Integer.__init__

def __init__(
base: int = 10,
special_values: Sequence = (),
impute_nan: bool = False,
non_sample_values: Sequence = (),
protection: Protection = Protection()
) -> None

Integer column preprocessor.

Arguments:

  • base - Base in which to represent the integer values.
  • special_values - A sequence of values to be handled separately as categories.
  • impute_nan - Whether to impute NaN values. If True, NaN values are replaced with other plausible values.
  • non_sample_values - A sequence of values that should not be sampled.
  • protection - A Protection object.

class Numeric

Numeric.__init__

def __init__(
base: int = 10,
max_digits: int = 12,
special_values: Sequence = (),
impute_nan: bool = False,
non_sample_values: Sequence = (),
protection: Protection = Protection()
) -> None

Numeric column preprocessor.

Arguments:

  • base - The base of the numeric system.
  • max_digits - Number of digits to keep in the numeric values.
  • special_values - A sequence of values to be handled separately as categories.
  • impute_nan - Whether to impute NaN values. If True, NaN values are replaced with other plausible values.
  • non_sample_values - A sequence of values that should not be sampled.
  • protection - A Protection object.

class ColumnPreproc

Preprocessing instructions for a column.

Arguments:

  • special_values - A sequence of special values to handle during preprocessing.
  • impute_nan - A flag indicating whether to impute NaN values during preprocessing. If True, NaN values will not be sampled during the generation of synthetic data.
  • non_sample_values - A sequence of values that should not be sampled during the generation of synthetic data.
  • protection - A Protection object or boolean flag indicating whether to apply protection to the column. If boolean, the default protection is applied, otherwise the Protection object configures the protection.

class TabularPreproc

TabularPreproc.__init__

def __init__(
schema: Schema,
preprocessors: dict[str, dict[str, ColumnPreproc | ArColumn | None]]
| None = None
) -> None

Preprocessor for tabular data.

Arguments:

  • schema - A Schema object.
  • preprocessors - A dictionary containing preprocessing instructions for each column in the schema. Keys are table names, values are dictionaries with column names as keys and preprocessing instructions as values. Preprocessing instructions can be instances of ColumnPreproc, a column preprocessor, or None. If None the default preprocessor will be instantiated based on the Column type defined in the Schema.

class TextPreproc

TextPreproc.__init__

def __init__(schema: Schema, table: str) -> None

Preprocessor for text columns of a table which is part of a relational structure.

Arguments:

  • schema - A Schema object.
  • table - Name of the target table in the schema that contains text columns.