Benchmarks
In this section, we present some benchmarks that illustrate the resources consumed by standard training on a specific CPU and a specific GPU machine, for several datasets and model sizes.
CPU training and generation
In this section, we present benchmarks for typical CPU training and synthesis times. Except for the batch size, all other parameters were set to their default values. All tests were conducted on an AWS EC2 c6i.4xlarge instance, which is equipped with 16 CPU cores and 32 GB of RAM.
Dataset Adult
The UCI Adult dataset, also known as the Census Income dataset, is a single table dataset with 15 columns and 29,305 records.
Model Size | Batch Size | Training Time | Training Steps | Time per step (s) | Generation Time (s) |
---|---|---|---|---|---|
Small | 256 | 320 | 4200 | 0.076 | 5 |
Medium | 256 | 425 | 3200 | 0.133 | 6 |
Large | 256 | 608 | 2600 | 0.233 | 8 |
Dataset Basket
We consider a subset of the BasketballMen dataset:
players
:- parent table
- 4,556 records
- primary key:
playerID
- feature columns:
pos
,height
,weight
,college
,race
,birthCity
,birthState
,birthCountry
season
:- child table
- 21,415 records
- foreign key:
playerID
- feature columns:
year
,stint
,tmID
,lgID
,GP
,points
,GS
,assists
,steals
,minutes
all_stars
:- child table
- 1,487 records
- foreign key:
playerID
- feature columns:
conference
,league_id
,points
,rebounds
,assists
,blocks
Model Size | Batch Size | Training Time | Training Steps | Time per step (s) | Generation Time (s) |
---|---|---|---|---|---|
Small | 64 | 1923 | 5800 | 0.332 | 8 |
Medium | 64 | 2339 | 3800 | 0.616 | 16 |
Large | 64 | 3312 | 2800 | 1.183 | 18 |
Dataset Airbnb
The Airbnb Open Data dataset includes information about Airbnb listings in New York City. In its original form, it consists of a single table, but we find it natural to rearrange it into two tables:
host
:- parent table
- 33,712 records
- primary key:
host_id
. - feature columns:
host_name
,calculated_host_listings_count
listings
:- child table
- 43,885 records
- primary key:
id
- foreign key:
host_id
- feature columns:
neighbourhood_group
,neighbourhood
,latitude
,longitude
,room_type
,price
,minimum_nights
,number_of_reviews
,last_review
,reviews_per_month
,availability_365
Model Size | Batch Size | Training Time | Training Steps | Time per step (s) | Generation Time (s) |
---|---|---|---|---|---|
Small | 32 | 1892 | 18200 | 0.104 | 24 |
Medium | 32 | 2933 | 16400 | 0.179 | 29 |
Large | 32 | 3526 | 11200 | 0.315 | 39 |
Single- and Multi-GPU training
In this section, we present benchmarks for typical CPU and GPU training times running on a virtual machine with 4x L40S GPUs and a 32-core AMD EPYC 9354 CPU. They provide examples of the possible speed-up provided by single- and multi-GPU training with respect to CPU training.
Dataset Berka
The Berka dataset is a collection of financial information from a Czech bank.
We consider the following tables:
account
:- parent table
- 4,050 records
- primary key:
account_id
- feature columns:
district_id
,frequency
,date
order
:- child table
- 5,822 records
- primary key:
order_id
- foreign key:
account_id
- feature columns:
bank_to
,account_to
,amount
,k_symbol
loan
:- child table
- 606 records
- primary key:
loan_id
- foreign key:
account_id
- feature columns:
date
,amount
,duration
,payments
,status
trans
:- child table
- 388249 records
- primary key:
trans_id
- foreign key:
account_id
- feature columns:
date
,type
,operation
,amount
,balance
,k_symbol
,bank
- maximum 100 transactions per client
The following parameters were used for all runs:
- batch size: 512
- training steps: 20,000
Model Size | CPU: Time per step (s) | CPU: RAM (GiB) | GPU: Time per step (s) | GPU: VRAM (GiB) | 4x GPUs: Time per step (s) | 4x GPUs: VRAM max per GPU (GiB) |
---|---|---|---|---|---|---|
Small | 18.6 | 5.6 | 0.63 | 3.4 | 0.26 | 4.6 |
Medium | 34.5 | 7.9 | 1.09 | 6.4 | 0.32 | 10.4 |
Large | 63.7 | 14.2 | 1.91 | 15.5 | 0.50 | 16.0 |
Dataset Porto
The Porto dataset is a dataset containing taxi trajectories recorded over one year (from 2013/07/01 to 2014/06/30) in the city of Porto, Portugal.
The original dataset consists of a single table, with the column POLYLINE
containing all the GPS coordinates
of each trip.
We prefer to split the data into two tables, with a parent table trip
containing some trip features,
and a child table trajectory
containing each individual GPS coordinate as a single row in the coord
column:
trip
:- parent table
- 100,000 records
- primary key:
TRIP_ID
- feature columns:
TAXI_ID
,CALL_TYPE
,TIMESTAMP
trajecotry
:- child table
- 4,377,175 records
- foreign key:
TRIP_ID
- feature columns:
coord
- max 100 GPS records per trip
The following parameters were used for all runs:
- batch size: 2,048
- training steps: 20,000
Model Size | CPU: Time per step (s) | CPU: RAM (GiB) | GPU: Time per step (s) | GPU: VRAM (GiB) | 4x GPUs: Time per step (s) | 4x GPUs: VRAM max per GPU (GiB) |
---|---|---|---|---|---|---|
Small | 12.0 | 6.7 | 0.37 | 3.2 | 0.26 | 4.4 |
Medium | 24.6 | 9.5 | 0.70 | 5.9 | 0.22 | 7.1 |
Large | 48.7 | 14.9 | 1.30 | 11.4 | 0.35 | 12.6 |