Datasets

class TabularDataset

TabularDataset.from_data

@classmethod
def from_data(cls: type["TabularDataset"],
              data: RelationalData,
              preproc: TabularPreproc,
              on_disk: bool = False,
              path: Path | str | None = None,
              max_block_size: int = 0) -> "TabularDataset"

Build a TabularDataset from the input data.

Arguments:

data - A RelationalData object with the data to be processed.
preproc - A TabularPreproc to preprocess the data.
on_disk - Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.
path - The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.
max_block_size - Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.

Returns:

A TabularDataset object.

TabularDataset.from_disk

@classmethod
def from_disk(cls: type["TabularDataset"],
              preproc: TabularPreproc,
              path: Path | str,
              max_block_size: int = 0) -> "TabularDataset"

Load a TabularDataset from disk.

Arguments:

preproc - The TabularPreproc object used to preprocess the data.
path - The path to the directory where the processed data is stored on disk.
max_block_size - Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.

Returns:

A TabularDataset object.

class TextDataset

TextDataset.from_data

@classmethod
def from_data(cls: type["TextDataset"],
              data: RelationalData,
              preproc: TextPreproc,
              on_disk: bool = False,
              path: Path | str | None = None) -> "TextDataset"

Build a TextDataset from the input data.

Arguments:

data - A RelationalData object with the data to be processed.
preproc - A TextPreproc to preprocess the data.
on_disk - Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.
path - The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.

Returns:

A TextDataset object.

TextDataset.from_disk

@classmethod
def from_disk(cls: type["TextDataset"], preproc: TextPreproc,
              path: Path | str) -> "TextDataset"

Load a TextDataset from disk.

Arguments:

preproc - The TextPreproc object used to preprocess the data.
path - The path to the directory where the processed data is stored on disk.

Returns:

A TextDataset object.