Evaluation

report

def report(data_train: RelationalData,
           data_test: RelationalData,
           data_synth: RelationalData,
           path: Path | str,
           n_max_train: int | None = 5_000,
           n_max_test: int | None = 5_000) -> None

Collect summary statistics for the evaluation of synthetic data in terms of data quality and privacy protection.

Arguments:

data_train - A RelationalData object containing the original training data.
data_test - A RelationalData object containing the original test data.
data_synth - A RelationalData object containing the generated synthetic data.
path - A path to save the report.
n_max_train - The maximum number of samples per table (for train data) to use in the report.
n_max_test - The maximum number of samples per table (for test data) to use in the report.

compute_privacy_stats

def compute_privacy_stats(
        data_train: RelationalData,
        data_synth: RelationalData,
        n_max: int | None = None,
        n_folds_std: int | None = 10) -> dict[str, PrivacyStats | None]

Compute privacy statistics for the evaluation of synthetic data.

Arguments:

data_train - A RelationalData object containing the original training data.
data_synth - A RelationalData object containing the generated synthetic data.
n_max - The maximum number of samples per table (for both train and synth data) to use in the computation.
n_folds_std - Number of folds to use in teh computation of the standard deviation. If None, the computation is not performed. Default: 10.

Returns:

A dictionary mapping each table to a PrivacyStats object (or None in case of error).