Datasets
class TabularDataset
TabularDataset.from_data
@classmethoddef from_data(cls: type["TabularDataset"], data: RelationalData, preproc: TabularPreproc, on_disk: bool = False, path: Path | str | None = None, max_block_size: int = 0) -> "TabularDataset"
Build a TabularDataset
from the input data.
Arguments:
data
- ARelationalData
object with the data to be processed.preproc
- ATabularPreproc
to preprocess the data.on_disk
- Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.path
- The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.max_block_size
- Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.
Returns:
A TabularDataset
object.
TabularDataset.from_disk
@classmethoddef from_disk(cls: type["TabularDataset"], preproc: TabularPreproc, path: Path | str, max_block_size: int = 0) -> "TabularDataset"
Load a TabularDataset
from disk.
Arguments:
preproc
- TheTabularPreproc
object used to preprocess the data.path
- The path to the directory where the processed data is stored on disk.max_block_size
- Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.
Returns:
A TabularDataset
object.
class TextDataset
TextDataset.from_data
@classmethoddef from_data(cls: type["TextDataset"], data: RelationalData, preproc: TextPreproc, on_disk: bool = False, path: Path | str | None = None) -> "TextDataset"
Build a TextDataset
from the input data.
Arguments:
data
- ARelationalData
object with the data to be processed.preproc
- ATextPreproc
to preprocess the data.on_disk
- Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.path
- The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.
Returns:
A TextDataset
object.
TextDataset.from_disk
@classmethoddef from_disk(cls: type["TextDataset"], preproc: TextPreproc, path: Path | str) -> "TextDataset"
Load a TextDataset
from disk.
Arguments:
preproc
- TheTextPreproc
object used to preprocess the data.path
- The path to the directory where the processed data is stored on disk.
Returns:
A TextDataset
object.