Datasets
class TabularDataset
TabularDataset.from_data
Build a TabularDataset
from the input data.
Arguments:
data
- ARelationalData
object with the data to be processed.preproc
- ATabularPreproc
to preprocess the data.on_disk
- Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.path
- The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.max_block_size
- Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.
Returns:
A TabularDataset
object.
TabularDataset.from_disk
Load a TabularDataset
from disk.
Arguments:
preproc
- TheTabularPreproc
object used to preprocess the data.path
- The path to the directory where the processed data is stored on disk.max_block_size
- Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.
Returns:
A TabularDataset
object.
class TextDataset
TextDataset.from_data
Build a TextDataset
from the input data.
Arguments:
data
- ARelationalData
object with the data to be processed.preproc
- ATextPreproc
to preprocess the data.on_disk
- Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.path
- The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.
Returns:
A TextDataset
object.
TextDataset.from_disk
Load a TextDataset
from disk.
Arguments:
preproc
- TheTextPreproc
object used to preprocess the data.path
- The path to the directory where the processed data is stored on disk.
Returns:
A TextDataset
object.