Bivariate distributions
A bivariate distribution is the joint distribution of values observed for two variables within a dataset. While marginal distributions focus on a single variable, bivariate distributions consider the relationship between two variables. Each point in the bivariate distribution represents a combination of values for the two variables, providing insights into their joint behavior.
Bivariate empirical distribution comparison using heatmaps
Bivariate empirical distribution comparison plots are often depicted as heatmaps. They offer a visual representation of the joint frequency distribution of values for two variables within a dataset. These heatmaps partition the joint value space of the two variables into bins and display the frequency or density of observations falling within each bin.
Assessing synthetic data quality through bivariate distribution comparison
Bivariate empirical distribution comparison plots help in evaluating the quality of generated synthetic data. By comparing the heatmaps of real and synthetic data, users can assess how well the synthetic data replicates the joint distribution of the two variables in the real data.
Key aspects to consider when doing the comparison include the overall pattern, strength, and direction of the relationship between the variables, as well as any deviations or discrepancies between the real and synthetic distributions. These comparisons enable users to verify whether the synthetic data accurately captures the joint distributional characteristics of the original dataset, ensuring its fidelity and suitability for downstream analysis.