Skip to content

Evaluation

report

def report(data_train: RelationalData,
data_test: RelationalData,
data_synth: RelationalData,
path: Path | str,
n_max_train: int | None = 5_000,
n_max_test: int | None = 5_000) -> None

Collect summary statistics for the evaluation of synthetic data in terms of data quality and privacy protection.

Arguments:

  • data_train - A RelationalData object containing the original training data.
  • data_test - A RelationalData object containing the original test data.
  • data_synth - A RelationalData object containing the generated synthetic data.
  • path - A path to save the report.
  • n_max_train - The maximum number of samples per table (for train data) to use in the report.
  • n_max_test - The maximum number of samples per table (for test data) to use in the report.

compute_privacy_stats

def compute_privacy_stats(
data_train: RelationalData,
data_synth: RelationalData,
n_max: int | None = None,
n_folds_std: int | None = 10) -> dict[str, PrivacyStats | None]

Compute privacy statistics for the evaluation of synthetic data.

Arguments:

  • data_train - A RelationalData object containing the original training data.
  • data_synth - A RelationalData object containing the generated synthetic data.
  • n_max - The maximum number of samples per table (for both train and synth data) to use in the computation.
  • n_folds_std - Number of folds to use in teh computation of the standard deviation. If None, the computation is not performed. Default: 10.

Returns:

A dictionary mapping each table to a PrivacyStats object (or None in case of error).