Skip to content

Example Workflows

Here you can find example workflows that combine two or more REST API endpoints to perform common operations.

Upload a CSV file containing a single-table dataset and download a CSV file containing a synthetic dataset

  • Upload a dataset as a CSV file
  • (optional) Detect its schema
  • Create a source catalog that reads from the file
  • Create a generator that reads from the source catalog
  • Use the generator to run an execution
  • Download a ZIP file containing the synthetic dataset as a CSV file

Assumptions

We assume:

  • our requests contain the required Authorization header (see the OpenAPI Documentation for more information)
  • We have a CSV file named sample.csv containing the following text:
One,Two,Three
1,2.32,abc
2,2.55,bce
3,2.42,ced
4,2.74,def
5,7.32,efg
6,7.61,fgh
7,2.29,hij
8,3.34,ijk
9,5.85,jkl
10,4.42,klm
11,3.11,lmn

Upload a dataset as a CSV file

First we need to upload the dataset to the platform. We do so with a POST /api/v1/catalogs/uploads request with a multipart/form-data payload containing the file:

-----------------------------4498002542570133167595167108
Content-Disposition: form-data; name="file"; filename="sample.csv"
Content-Type: text/csv
One,Two,Three
1,2.32,abc
2,2.55,bce
3,2.42,ced
4,2.74,def
5,7.32,efg
6,7.61,fgh
7,2.29,hij
8,3.34,ijk
9,5.85,jkl
10,4.42,klm
11,3.11,lmn
-----------------------------4498002542570133167595167108--

and receive a 200 OK response with payload:

{
"status": "ok",
"data": {
"id": "aDJHVmM0SFFWaEZnOlpyQjVvTEF2bFV0ZkZ0eklSMXp6MkZmqjRjkNldEeLPGuF-6Pq5fgHAFxAIUe_P7Xkk-ZH01oFiJk8",
...
}
}

(optional) Detect its schema

To correctly read data from the dataset the platform needs to know its schema. The schema can be automatically detected by the platform, but this is not mandatory if we already know it or can determine it by ourselves. For this example, we will detect it.

NOTE: the detection is heuristic, so it must be reviewed to avoid inaccuracies

We make a POST /api/v1/catalogs/create/introspect with payload:

{
"type": "rel_file_src",
"config": {
"files": [
"aDJHVmM0SFFWaEZnOlpyQjVvTEF2bFV0ZkZ0eklSMXp6MkZmqjRjkNldEeLPGuF-6Pq5fgHAFxAIUe_P7Xkk-ZH01oFiJk8"
]
}
}

and receive a 200 OK response with payload:

{
"status": "ok",
"data": {
"schema": {
"tables": [
{
"name": "sample",
"columns": [
{
"name": "One",
"type": {
"nullable": true,
"type": "integer"
},
"primary": false
},
{
"name": "Two",
"type": {
"nullable": true,
"type": "numeric"
},
"primary": false
},
{
"name": "Three",
"type": {
"nullable": true,
"caseInsensitive": false,
"type": "categorical"
},
"primary": false
}
],
"foreign": []
}
]
}
}
}

We review it and confirm it matches our data correctly.

Create a source catalog that reads from the file

We make a POST /api/v1/catalogs request with payload containing:

  • config: the same input we used for the introspection (schema detection) step
  • schema: the schema we received from the introspection (schema detection) step
    (or that we determined on our own if we skipped it)
{
"type": "rel_file_src",
"config": {
"files": [
"aDJHVmM0SFFWaEZnOlpyQjVvTEF2bFV0ZkZ0eklSMXp6MkZmqjRjkNldEeLPGuF-6Pq5fgHAFxAIUe_P7Xkk-ZH01oFiJk8"
]
},
"schema": {
"tables": [
{
"columns": [
{
"name": "One",
"type": {
"nullable": true,
"type": "integer"
},
"primary": false
},
{
"name": "Two",
"type": {
"nullable": true,
"type": "numeric"
},
"primary": false
},
{
"name": "Three",
"type": {
"nullable": true,
"caseInsensitive": false,
"type": "categorical"
},
"primary": false
}
],
"foreign": [],
"name": "sample"
}
]
}
}

and receive a 200 OK response with payload:

{
"status": "ok",
"data": {
"id": "h56Q4wRMcvhjvhMQPWXJMXjcP2",
...
}
}

Create a generator that reads from the source catalog

In order to generate synthetic data we need a generator that uses our source catalog. We create it with a POST /api/v1/generators request with payload:

{
"sourceId": "5jXvHQGjQGFQ5gPq3vxH2wVpMm",
"config": {
"tables": [
{
"columns": [
{
"name": "One",
"type": {
"type": "integer"
}
},
{
"name": "Two",
"type": {
"type": "numeric"
}
},
{
"name": "Three",
"type": {
"type": "categorical"
}
}
],
"name": "sample"
}
]
}
}

and receive a 200 OK response with payload:

{
"status": "ok",
"data": {
"id": "xXFQV82VG84wphJrR3f9MHmhgw",
...
}
}

generators can be configured with many additional parameters. In this scenario their default values are enough and we omit them.

Use the generator to run an execution

We make a POST /api/v1/generators/xXFQV82VG84wphJrR3f9MHmhgw/executions with no payload and receive a 200 OK response with payload:

{
"status": "ok",
"data": {
"id": "C62jpV9JFX78HgC3VGmMqQgx6q",
"status": "pending",
...
}
}

Since status is "pending" we wait some time then make a

GET /api/v1/generators/xXFQV82VG84wphJrR3f9MHmhgw/executions/C62jpV9JFX78HgC3VGmMqQgx6q

request and receive a 200 OK response with payload:

{
"status": "ok",
"data": {
"status": "completed",
...
}
}

this time status is "completed". (otherwise we wait and repeat)

Download a ZIP file containing the synthetic dataset as a CSV file

We make a GET /api/v1/catalogs/cjg4XFmvW52CGvhp5v7XhRv3vg/download?fmt=csv request and receive a 200 OK response with payload:

{
"status": "ok",
"data": {
"url": "/api/v1/cd/Z0pMeDlMbksxS29wTWxzR2RpbFRzNjNydUlyWU5JOWF4Qjl4TUQxVnVXN1JRYTVqc0JiQkNMZXRTYlFIOUhPR2aqFDs5Ylza6SUsFMgg2UvoRF4MObt-WKVJ2bM62npx63NStA/catalog_cjg4XFmvW52CGvhp5v7XhRv3vg.zip"
}
}

Here url contains the URL of a ZIP file that can be downloaded and extracted to find the sample.csv file containing the CSV synthetic dataset:

One,Two,Three
1,2.29,Dsa
7,3.51,Qsc
6,2.35,Ewq
8,2.29,Dsa
4,2.42,Edc
1,2.29,Dsa
5,3.32,Cxz
2,2.75,Edc
6,5.61,Qwe
6,3.61,Asd
5,3.14,Qwe