Sample scripts for synthetic data generation

Single table

Read the Adult dataset from csv, perform the training of a MultiLevelSynth model, generate a report and the synthetic dataset and finally save the synthetic data to disk.

 1from aindo.synth import RelationalData, MultiLevelSynth
 2
 3data = RelationalData.from_dir(data_dir='path/to/data/dir/')
 4data_train, data_test = data.train_test_split()
 5
 6model = MultiLevelSynth(schema=data.schema)
 7model.train(train_data=data_train, n_epochs=50)
 8model.report(data_test, out_dir='out')
 9
10data_synth = model.sample(n_samples=data.n_samples)
11data_synth.to_csv(out_dir='out/synth')

Relational data

Read the BasketballMan dataset, perform the training of a GraphSynth model, generate a report and the synthetic dataset, and finally save the synthetic data to disk.

 1from aindo.synth import RelationalData, GraphSynth
 2
 3pks = {'players': 'playerID'}
 4fks = {'season': {'playerID': 'players'}, 'all_star': {'playerID': 'players'}}
 5data = RelationalData.from_dir(data_dir='path/to/data/dir', primary_keys=pks, foreign_keys=fks)
 6data_train, data_test = data.train_test_split()
 7
 8model = GraphSynth(schema=data.schema)
 9model.train(train_data=data, n_epochs=50)
10model.report(data_test, out_dir='out')
11
12data_synth = model.sample(n_samples=data.n_samples)
13data_synth.to_csv(out_dir='out/synth')