Example Workflows

Here you can find example workflows that combine two or more REST API endpoints to perform common operations.

Upload a CSV file containing a single-table dataset and download a CSV file containing a synthetic dataset

Upload a dataset as a CSV file
(optional) Detect its schema
Create a source catalog that reads from the file
Generate a synthetic dataset
- Alternative A: create a generator and an execution with separate requests
- Alternative B: create a generator and an execution with one request
Download a ZIP file containing the synthetic dataset as a CSV file

Assumptions

We assume:

our requests contain the required Authorization header (see the OpenAPI Documentation for more information)
We have a CSV file named sample.csv containing the following text:

One,Two,Three
1,2.32,abc
2,2.55,bce
3,2.42,ced
4,2.74,def
5,7.32,efg
6,7.61,fgh
7,2.29,hij
8,3.34,ijk
9,5.85,jkl
10,4.42,klm
11,3.11,lmn

Upload a dataset as a CSV file

First we need to upload the dataset to the platform. We do so with a POST /api/v1/catalogs/uploads request with a multipart/form-data payload containing the file:

-----------------------------4498002542570133167595167108
Content-Disposition: form-data; name="file"; filename="sample.csv"
Content-Type: text/csv

One,Two,Three
1,2.32,abc
2,2.55,bce
3,2.42,ced
4,2.74,def
5,7.32,efg
6,7.61,fgh
7,2.29,hij
8,3.34,ijk
9,5.85,jkl
10,4.42,klm
11,3.11,lmn

-----------------------------4498002542570133167595167108--

and receive a 200 OK response with payload:

{
  "status": "ok",
  "data": {
    "id": "aDJHVmM0SFFWaEZnOlpyQjVvTEF2bFV0ZkZ0eklSMXp6MkZmqjRjkNldEeLPGuF-6Pq5fgHAFxAIUe_P7Xkk-ZH01oFiJk8",
    ...
  }
}

(optional) Detect its schema

To correctly read data from the dataset the platform needs to know its schema. The schema can be automatically detected by the platform, but this is not mandatory if we already know it or can determine it by ourselves. For this example, we will detect it.

NOTE: the detection is heuristic, so it must be reviewed to avoid inaccuracies

We make a POST /api/v1/catalogs/create/introspect with payload:

{
  "type": "rel_file_src",
  "config": {
    "files": [
      "aDJHVmM0SFFWaEZnOlpyQjVvTEF2bFV0ZkZ0eklSMXp6MkZmqjRjkNldEeLPGuF-6Pq5fgHAFxAIUe_P7Xkk-ZH01oFiJk8"
    ]
  }
}

and receive a 200 OK response with payload:

{
  "status": "ok",
  "data": {
    "schema": {
      "tables": [
        {
          "name": "sample",
          "columns": [
            {
              "name": "One",
              "type": {
                "nullable": true,
                "type": "integer"
              },
              "primary": false
            },
            {
              "name": "Two",
              "type": {
                "nullable": true,
                "type": "numeric"
              },
              "primary": false
            },
            {
              "name": "Three",
              "type": {
                "nullable": true,
                "caseInsensitive": false,
                "type": "categorical"
              },
              "primary": false
            }
          ],
          "foreign": []
        }
      ]
    }
  }
}

We review it and confirm it matches our data correctly.

Create a source catalog that reads from the file

We make a POST /api/v1/catalogs request with payload containing:

config: the same input we used for the introspection (schema detection) step
schema: the schema we received from the introspection (schema detection) step
(or that we determined on our own if we skipped it)

{
  "type": "rel_file_src",
  "config": {
    "files": [
      "aDJHVmM0SFFWaEZnOlpyQjVvTEF2bFV0ZkZ0eklSMXp6MkZmqjRjkNldEeLPGuF-6Pq5fgHAFxAIUe_P7Xkk-ZH01oFiJk8"
    ]
  },
  "schema": {
    "tables": [
      {
        "columns": [
          {
            "name": "One",
            "type": {
              "nullable": true,
              "type": "integer"
            },
            "primary": false
          },
          {
            "name": "Two",
            "type": {
              "nullable": true,
              "type": "numeric"
            },
            "primary": false
          },
          {
            "name": "Three",
            "type": {
              "nullable": true,
              "caseInsensitive": false,
              "type": "categorical"
            },
            "primary": false
          }
        ],
        "foreign": [],
        "name": "sample"
      }
    ]
  }
}

and receive a 200 OK response with payload:

{
  "status": "ok",
  "data": {
    "id": "h56Q4wRMcvhjvhMQPWXJMXjcP2",
    ...
  }
}

Generate a synthetic dataset

In order to generate a synthetic dataset we need to create:

a generator, which is trained on source data
an execution of that generator, which uses the trained generator to produce synthetic data

These can be created with separate requests or with a single request.

Creating executions separately from their generator can be useful when the generator has to be trained at one time, but produce synthetic data at a later time, or the generator has to be trained once, but produce synthetic data more than once without re-incurring in training overhead for each generated synthetic dataset.

Creating both a generator and its (first) execution in one request can be useful when it is acceptable for generator training and synthetic data generation to have no delay in-between and allows for better performance as data is read only once.

Alternative A: create a generator and an execution with separate requests

Create a generator

In order to generate synthetic data we need a generator that uses our source catalog. We create it with a POST /api/v1/generators request with payload:

{
  "sourceId": "5jXvHQGjQGFQ5gPq3vxH2wVpMm",
  "config": {
    "tables": [
      {
        "columns": [
          {
            "name": "One",
            "type": {
              "type": "integer"
            }
          },
          {
            "name": "Two",
            "type": {
              "type": "numeric"
            }
          },
          {
            "name": "Three",
            "type": {
              "type": "categorical"
            }
          }
        ],
        "name": "sample"
      }
    ]
  }
}

and receive a 200 OK response with payload:

{
  "status": "ok",
  "data": {
    "id": "xXFQV82VG84wphJrR3f9MHmhgw",
    "status": "pending",
    ...
  }
}

Generators can be configured with additional parameters via a config parameter. In this tutorial their default values are acceptable thus we omit them.

Notice how status attribute’s value is "pending". This is because generator creation is a long-running operation, since the generator undergoes training when created. To confirm its completion we wait some time then make a GET /api/v1/generators/xXFQV82VG84wphJrR3f9MHmhgw request and receive a 200 OK response with payload:

{
  "status": "ok",
  "data": {
    "id": "xXFQV82VG84wphJrR3f9MHmhgw",
    "status": "completed",
    ...
  }
}

this time status is "completed".

Create a synthetic dataset using the generator

We make a POST /api/v1/generators/xXFQV82VG84wphJrR3f9MHmhgw/executions with no payload and receive a 200 OK response with payload:

{
  "status": "ok",
  "data": {
    "id": "C62jpV9JFX78HgC3VGmMqQgx6q",
    "status": "pending",
    ...
  }
}

Notice how the status attribute’s value is "pending". This is because execution creation is a long-running operation. To confirm its completion we wait some time then make a:

GET /api/v1/generators/xXFQV82VG84wphJrR3f9MHmhgw/executions/C62jpV9JFX78HgC3VGmMqQgx6q

request and receive a 200 OK response with payload:

{
    "status": "ok",
    "data": {
        "status": "completed",
        "destination": {
          "id": "cjg4XFmvW52CGvhp5v7XhRv3vg"
        }
        ...
    }
}

this time status is "completed". Notice how a destination attribute containing an id attribute is now present. This is the id of the catalog created on the platform to store the generated synthetic data.

Executions too can be configured with additional parameters via a config parameter. In this tutorial their default values are acceptable thus we omit them.

Alternative B: create a generator and an execution with one request

The request is similar to alternative A, but includes an executionRequest parameter. This can contain the same data accepted by the execution creation endpoint we used previously. Since the default values are acceptable, we set it to an empty object, thus make a POST /api/v1/generators request with payload:

{
  "sourceId": "5jXvHQGjQGFQ5gPq3vxH2wVpMm",
  "executionRequest": {},
  "config": {
    "tables": [
      {
        "columns": [
          {
            "name": "One",
            "type": {
              "type": "integer"
            }
          },
          {
            "name": "Two",
            "type": {
              "type": "numeric"
            }
          },
          {
            "name": "Three",
            "type": {
              "type": "categorical"
            }
          }
        ],
        "name": "sample"
      }
    ]
  }
}

and receive a 200 OK response with payload:

{
  "status": "ok",
  "data": {
    "id": "xXFQV82VG84wphJrR3f9MHmhgw",
    "status": "pending",
    "lastExecutionId": "C62jpV9JFX78HgC3VGmMqQgx6q",
    ...
  }
}

This is analogous to the response we received when creating a generator in alternative A. Notice how the response now includes a lastExecutionId attribute. This is the same attribute we received when creating an execution in alternative A, and can be used in the same way.

Download a ZIP file containing the synthetic dataset as a CSV file

We make a GET /api/v1/catalogs/cjg4XFmvW52CGvhp5v7XhRv3vg/download?fmt=csv request and receive a 200 OK response with payload:

{
    "status": "ok",
    "data": {
        "url": "/api/v1/cd/Z0pMeDlMbksxS29wTWxzR2RpbFRzNjNydUlyWU5JOWF4Qjl4TUQxVnVXN1JRYTVqc0JiQkNMZXRTYlFIOUhPR2aqFDs5Ylza6SUsFMgg2UvoRF4MObt-WKVJ2bM62npx63NStA/catalog_cjg4XFmvW52CGvhp5v7XhRv3vg.zip"
    }
}

Here url contains the URL of a ZIP file that can be downloaded and extracted to find the sample.csv file containing the CSV synthetic dataset:

One,Two,Three
1,2.29,Dsa
7,3.51,Qsc
6,2.35,Ewq
8,2.29,Dsa
4,2.42,Edc
1,2.29,Dsa
5,3.32,Cxz
2,2.75,Edc
6,5.61,Qwe
6,3.61,Asd
5,3.14,Qwe