Skip to content

Models

class TabularModel

TabularModel.build

@classmethod
def build(cls: type['TabularModel'],
preproc: TabularPreproc,
size: str | Size | TabularModelSize,
block: str | None = None,
dropout: float | None = 0.12) -> 'TabularModel'

Tabular model to generate synthetic tabular relational data.

Arguments:

  • preproc - A TabularPreproc object.
  • size - The size configuration of the model. Could be either a TabularModelSize object, a Size object, or a string representation of the latter.
  • block - The block type. The possible values depend on whether the data is single table or multi table. For a single table, either ‘free’ (default), ‘causal’, or ‘lstm’. For multi table data, either ‘free’ (default) or ‘lstm’.
  • dropout - The dropout probability.

Returns:

A TabularModel instance.

TabularModel.generate

def generate(n_samples: int | None = None,
ctx: pd.DataFrame | None = None,
batch_size: int = 0,
max_block_size: int = 0,
temp: float = 1.0) -> RelationalData

Generate synthetic relational data.

Arguments:

  • n_samples - Desired number of samples in the root table. Must be given if and only if ctx is nto provided.
  • ctx - The first columns of the root table from where to start a conditional generation. If provided, n_samples should not be given.
  • batch_size - Batch size used during generation. If 0, generate all data in a single batch.
  • max_block_size - Maximum length for each generated sample. Active only for multi-table datasets. If 0, no limit is enforced.
  • temp - Temperature parameter for sampling.

Returns:

A RelationalData object.

class TextModel

TextModel.build

@classmethod
def build(cls: type['TextModel'],
preproc: TextPreproc,
size: str | Size | TextModelSize,
block_size: int,
dropout: float | None = 0.12) -> 'TextModel'

Text model to generate synthetic text columns of a table which is part of a relational structure.

Arguments:

  • preproc - A TextPreproc object.
  • size - The size configuration of the model. Could be either a Size object, a TextModelSize object or a string representation of such objects.
  • block_size - Maximum text sequence length that the model can process.
  • dropout - The dropout probability.

Returns:

A TextModel instance.

TextModel.build_from_pretrained

@classmethod
def build_from_pretrained(cls: type['TextModel'],
preproc: TextPreproc,
path: Path | str,
block_size: int | None = None) -> 'TextModel'

Build a text model from a pretrained model.

Arguments:

  • preproc - A TextPreproc object.
  • path - The path to the checkpoint of the pre-trained model.
  • block_size - Maximum text sequence length that the model can process during fine-tuning.

Returns:

A TextModel instance with the weights loaded from the pre-trained model.

TextModel.generate

def generate(data: RelationalData,
batch_size: int = 0,
max_text_len: int = 0,
temp: float = 1.0) -> RelationalData

Generate text columns in the current table.

Arguments:

  • data - A RelationalData object containing synthetic data.
  • batch_size - Batch size used during generation. If 0, generate all data in a single batch.
  • max_text_len - Maximum length for the generated text. If 0, the maximum possible value is used, namely the value of the TabularModel.max_block_size attribute.
  • temp - Temperature parameter for sampling.

Returns:

A RelationalData object.

class Size

Enumeration class representing different model sizes. Supported sizes are: SMALL, MEDIUM and LARGE.

class TabularModelSize

Model size for TabularModel objects.

Arguments:

  • n_layers - Number of internal layers.
  • h - Number of heads.
  • d - Size of the internal dimension.

TabularModelSize.from_size

@classmethod
def from_size(cls, size: Size | str) -> 'TabularModelSize'

Create an instance based on a given Size or its string representation.

Arguments:

  • size - A Size object.

class TextModelSize

Model size for TextModel objects.

Arguments:

  • n_layers - Number of internal layers.
  • h - Number of heads.
  • d - Size of the internal dimension.

TextModelSize.from_size

@classmethod
def from_size(cls, size: Size | str) -> 'TextModelSize'

Create an instance based on a given Size or its string representation.

Arguments:

  • size - A Size or a str representing a Size.