Sample scripts for synthetic data generation¶
Single table¶
Read the Adult dataset from csv, perform the training of a
MultiLevelSynth
model, generate a report and the synthetic dataset and finally save the synthetic data to disk.
1from aindo.synth import RelationalData, MultiLevelSynth
2
3data = RelationalData.from_dir(data_dir='path/to/data/dir/')
4data_train, data_test = data.train_test_split()
5
6model = MultiLevelSynth(schema=data.schema)
7model.train(train_data=data_train, n_epochs=50)
8model.report(data_test, out_dir='out')
9
10data_synth = model.sample(n_samples=data.n_samples)
11data_synth.to_csv(out_dir='out/synth')
Relational data¶
Read the BasketballMan dataset, perform the training of a
GraphSynth
model, generate a report and the synthetic dataset, and finally save the synthetic data to disk.
1from aindo.synth import RelationalData, GraphSynth
2
3pks = {'players': 'playerID'}
4fks = {'season': {'playerID': 'players'}, 'all_star': {'playerID': 'players'}}
5data = RelationalData.from_dir(data_dir='path/to/data/dir', primary_keys=pks, foreign_keys=fks)
6data_train, data_test = data.train_test_split()
7
8model = GraphSynth(schema=data.schema)
9model.train(train_data=data, n_epochs=50)
10model.report(data_test, out_dir='out')
11
12data_synth = model.sample(n_samples=data.n_samples)
13data_synth.to_csv(out_dir='out/synth')