Models
class TabularModel
TabularModel.build
Tabular model to generate synthetic tabular relational data.
Arguments:
preproc
- ATabularPreproc
object.size
- The size configuration of the model. Could be either aTabularModelSize
object, aSize
object, or a string representation of the latter.block
- The block type. The possible values depend on whether the data is single table or multi table. For a single table, either ‘free’ (default), ‘causal’, or ‘lstm’. For multi table data, either ‘free’ (default) or ‘lstm’.dropout
- The dropout probability.
Returns:
A TabularModel
instance.
TabularModel.generate
Generate synthetic relational data.
Arguments:
n_samples
- Desired number of samples in the root table. Must be given if and only if ctx is nto provided.ctx
- The first columns of the root table from where to start a conditional generation. If provided, n_samples should not be given.batch_size
- Batch size used during generation. If 0, generate all data in a single batch.max_block_size
- Maximum length for each generated sample. Active only for multi-table datasets. If 0, no limit is enforced.temp
- Temperature parameter for sampling.
Returns:
A RelationalData
object.
class TextModel
TextModel.build
Text model to generate synthetic text columns of a table which is part of a relational structure.
Arguments:
preproc
- ATextPreproc
object.size
- The size configuration of the model. Could be either aSize
object, aTextModelSize
object or a string representation of such objects.block_size
- Maximum text sequence length that the model can process.dropout
- The dropout probability.
Returns:
A TextModel
instance.
TextModel.build_from_pretrained
Build a text model from a pretrained model.
Arguments:
preproc
- ATextPreproc
object.path
- The path to the checkpoint of the pre-trained model.block_size
- Maximum text sequence length that the model can process during fine-tuning.
Returns:
A TextModel
instance with the weights loaded from the pre-trained model.
TextModel.generate
Generate text columns in the current table.
Arguments:
data
- ARelationalData
object containing synthetic data.batch_size
- Batch size used during generation. If 0, generate all data in a single batch.max_text_len
- Maximum length for the generated text. If 0, the maximum possible value is used, namely the value of theTabularModel.max_block_size
attribute.temp
- Temperature parameter for sampling.
Returns:
A RelationalData
object.
class Size
Enumeration class representing different model sizes. Supported sizes are: SMALL, MEDIUM and LARGE.
class TabularModelSize
Model size for TabularModel
objects.
Arguments:
n_layers
- Number of internal layers.h
- Number of heads.d
- Size of the internal dimension.
TabularModelSize.from_size
Create an instance based on a given Size
or its string representation.
Arguments:
size
- ASize
object.
class TextModelSize
Model size for TextModel
objects.
Arguments:
n_layers
- Number of internal layers.h
- Number of heads.d
- Size of the internal dimension.
TextModelSize.from_size
Create an instance based on a given Size
or its string representation.
Arguments:
size
- ASize
or a str representing aSize
.