Skip to content

Datasets

class TabularDataset

TabularDataset.from_data

@classmethod
def from_data(cls: type['TabularDataset'],
data: RelationalData,
preproc: TabularPreproc,
on_disk: bool = False,
path: Path | str | None = None,
max_block_size: int = 0) -> 'TabularDataset'

Build a TabularDataset from the input data.

Arguments:

  • data - A RelationalData object with the data to be processed.
  • preproc - A TabularPreproc to preprocess the data.
  • on_disk - Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.
  • path - The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.
  • max_block_size - Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.

Returns:

A TabularDataset object.

TabularDataset.from_disk

@classmethod
def from_disk(cls: type['TabularDataset'],
preproc: TabularPreproc,
path: Path | str,
max_block_size: int = 0) -> 'TabularDataset'

Load a TabularDataset from disk.

Arguments:

  • preproc - The TabularPreproc object used to preprocess the data.
  • path - The path to the directory where the processed data is stored on disk.
  • max_block_size - Maximum sequence length that will be used during training. It must be larger than the sequence length of each table, and therefore is available only for multi-table datasets. If 0, no maximum sequence length is imposed.

Returns:

A TabularDataset object.

class TextDataset

TextDataset.from_data

@classmethod
def from_data(cls: type['TextDataset'],
data: RelationalData,
preproc: TextPreproc,
on_disk: bool = False,
path: Path | str | None = None) -> 'TextDataset'

Build a TextDataset from the input data.

Arguments:

  • data - A RelationalData object with the data to be processed.
  • preproc - A TextPreproc to preprocess the data.
  • on_disk - Whether to save the processed data on disk. If True, during training the data will be loaded one batch at a time. This may slightly slow down the training, but will reduce the memory consumption.
  • path - The path to a directory where to save the processed data on disk. If None, it will be saved in a temporary directory.

Returns:

A TextDataset object.

TextDataset.from_disk

@classmethod
def from_disk(cls: type['TextDataset'], preproc: TextPreproc,
path: Path | str) -> 'TextDataset'

Load a TextDataset from disk.

Arguments:

  • preproc - The TextPreproc object used to preprocess the data.
  • path - The path to the directory where the processed data is stored on disk.

Returns:

A TextDataset object.