Skip to main content

Ingest from SQL Workflow

This workflow allows you to ingest data from a PostgreSQL database for which you have credentials.

For more detailed information about what this workflow is doing, see the ingest-from-sql documentation in GitHub.

Creating the workflow

For ingesting, you first will need to gather some credentials and figure out a little about the data.

Gathering Credentials

The credentials you will need are:

  • Hostname
  • Port
  • Username
  • Password
  • Database
note

The host must be accessible on the internet. You can check this by testing your credentials using psql or pgAdmin from a system on a home internet connection.

note

If you don't have a port, it's probably the default (5432) - You won't need to add it.

Determining What to Ingest

If you don't know the layout of your database, don't worry — you can just accept the defaults for the rest. If things don't work out like you'd hope, resetting will allow you to just try again from scratch.

If you have a table that you know has images or PDFs in it, you can specify those when starting the workflow.

If there are any issues with tables that have images or PDFs, see Troubleshooting

Starting the workflow

For general information on creating workflows in ApertureDB Cloud see Creating and Deleting Workflows.

Configure your workflow by selecting "Ingest from SQL":

In Workflows choose SQL

The required settings that you gathered at the top will be first:

SQL Options setup dialog

Once you have filled in the credentials, if you scroll down, you will see some additional options, for special configuration of image and pdf tables. If you know the names of tables that have those items, add them here. Additionally if you have tables or columns you want the ingest to ignore, you can also add those here.

Advanced SQL Options dialog

Once you have filled in the fields, click "Submit". Your workflow will be created and will start running.

See the results

Results will start being available in your database as soon as your workflow status is 'Started'.

To view data you have ingested, go to the Web UI for your instance.

Resetting

If data wasn't imported in the way you want, for example if an image table was imported as entities, you can simply remove all the data that this workflow ingested from a specific host and database. The following query will retrieve the keys for runs:

[
{
"FindEntity": {
"_ref":1,
"with_class":"WorkflowSpec",
"constraints": {
"workflow_name": [
"==",
"sql-loader"
]
}
}
},{
"FindEntity":{
"is_connected_to": {"ref":1},
"results" : {
"list": ["workflow_id"]
}
}
}]

This is an example of what the results will look like:

SQL Creator Key

The result for workflow_id is what you are looking for. Save it, and replace YOUR_ID_HERE with it in the query below.

Then, you can run a query to remove all Entities, Images and Blobs that were ingested by it:

[{
"FindEntity": {
"_ref":1,
"with_class": "WorkflowRun",
"constraints": { "workflow_id": [ "==", YOUR_ID_HERE ] }
}
},{
"FindEntity": {
"_ref":2,
"is_connected_to": { "ref":1 }
}
},{
"FindImage": {
"_ref":3,
"is_connected_to": { "ref":1 }
}
},{
"FindBlob": {
"_ref":4,
"is_connected_to": { "ref":1 }
}
},{
"DeleteEntity": {
"ref":2
}
},{
"DeleteImage": {
"ref":3
}
},{
"DeleteBlob": {
"ref":4
}
}]

It will show "count" for all objects deleted. If none were, make sure you entered the workflow_id correctly, or that you didn't already delete everything!

Troubleshooting

If the workflow fails to load any data, first ensure a few things:

  • the hostname is spelled correctly
  • the server can be accessed from the network where the workflow is hosted (e.g. Google Cloud)
  • the database name is correct
  • the username and password are correct
  • the user has the permissions required to access the database

If a image or PDF table fails to ingest, check that it does have a blob ( BYTEA in PostgreSQL ) column.

If problems persist, join our Slack workspace for additional help.