Models
class TabularModel
TabularModel.build
Tabular model to generate synthetic tabular relational data.
Arguments:
preproc
- ATabularPreproc
object.size
- The size configuration of the model. Could be either aTabularModelSize
object, aSize
object, or a string representation of the latter.block
- The block type. The possible values depend on whether the data is single table or multi table. For a single table, either ‘free’ (default), ‘causal’, or ‘lstm’. For multi table data, either ‘free’ (default) or ‘lstm’.dropout
- The dropout probability.
Returns:
A TabularModel
instance.
TabularModel.generate
Generate synthetic relational data.
Arguments:
n_samples
- Desired number of samples in the root table. Must be given if and only ifctx
is not provided.ctx
- The columns of the context from where to start a conditional generation. If provided,n_samples
should not be given. The content of thepd.DataFrame
’s must match the context columns provided to theTabularPreproc
:- If only a subset of the root table’s columns are provided, the model will generate the foreign keys.
- If a subset of columns for each table are provided, each
pd.DataFrame
must also contain the primary and foreign keys, and the generated synthetic data will have the same keys provided by the context. The foreign keys referring to lookup tables should be treated as feature columns, not as foreign keys.
batch_size
- Batch size used during generation. If 0, all data is generated in a single batch.max_block_size
- Maximum length for each generated sample. Active only for multi-table datasets and for generation from the root table (denoted above as 1.). If 0, no limit is enforced.temp
- Temperature parameter for sampling.
Returns:
A RelationalData
object with the generated synthetic tabular data.
TabularModel.save
Save the model to a checkpoint at the given path.
TabularModel.load
Load the model from the checkpoint at the given path.
class TextModel
TextModel.build
Text model to generate synthetic text columns of a table which is part of a relational structure.
Arguments:
preproc
- ATextPreproc
object.size
- The size configuration of the model. Could be either aSize
object, aTextModelSize
object or a string representation of such objects.block_size
- Maximum text sequence length that the model can process.dropout
- The dropout probability.
Returns:
A TextModel
instance.
TextModel.build_from_pretrained
Build a text model from a pretrained model.
Arguments:
preproc
- ATextPreproc
object.path
- The path to the checkpoint of the pre-trained model.block_size
- Maximum text sequence length that the model can process during fine-tuning.
Returns:
A TextModel
instance with the weights loaded from the pre-trained model.
TextModel.generate
Generate text columns in the current table.
Arguments:
data
- ARelationalData
object containing synthetic data.batch_size
- Batch size used during generation. If 0, generate all data in a single batch.max_text_len
- Maximum length for the generated text. If 0, the maximum possible value is used, namely the value of theTabularModel.max_block_size
attribute.temp
- Temperature parameter for sampling.
Returns:
A RelationalData
object with the generated synthetic text data.
TextModel.save
Save the model to a checkpoint at the given path.
TextModel.load
Load the model from the checkpoint at the given path.
class Size
Enumeration class representing different model sizes. Supported sizes are: SMALL, MEDIUM and LARGE.
class TabularModelSize
Model size for TabularModel
objects.
Arguments:
n_layers
- Number of internal layers.h
- Number of heads.d
- Size of the internal dimension.
TabularModelSize.from_size
Create an instance based on a given Size
or its string representation.
Arguments:
size
- ASize
object.
class TextModelSize
Model size for TextModel
objects.
Arguments:
n_layers
- Number of internal layers.h
- Number of heads.d
- Size of the internal dimension.
TextModelSize.from_size
Create an instance based on a given Size
or its string representation.
Arguments:
size
- ASize
or a str representing aSize
.