Skip to main content

Ingest from Bucket Workflow

This workflow allows you to ingest data from an AWS or GCP bucket for which you have credentials.

This video demonstrates how to ingest data into ApertureDB from a cloud bucket.

Creating the workflow

Creating and deleting workflows
For general information on creating workflows in ApertureDB Cloud see Creating and Deleting Workflows.

This workflow supports multiple different cloud providers.

This workflow allows you to ingest images from the AWS Simple Storage Service (S3) into ApertureDB. This lets you use your own existing data. This provides an easy way to get started with ApertureDB, and to see how it can be used with real data.

[object Object]
1
2
3
4
5
  1. Enter the AWS S3 bucket name. You may be able to list your S3 buckets using aws s3 ls or by using the AWS S3 Console.
  2. Enter the AWS access key. See AWS documentation for details on how to generate these keys or the Getting Credentials section on this page. It is important that the credentials have the appropriate permissions to access the bucket; see the Setting Permissions section on this page. For your own security, you may wish to generate new keys for this purpose.
  3. Enter the AWS secret access key; this is a password field.
  4. Decide whether you want to process images, videos, or PDFs. You can choose as many as you want, but you should select at least one.
  5. Click the blue button at the bottom.
See the GitHub repository for more information
For more detailed information about what this workflow is doing, additional information about the parameters, and how to run the workflow outside of the ApertureDB Cloud, see the ingest-from-bucket documentation in GitHub.

See the results

Results will start being available in your database as soon as your bucket status is 'Started'.

To view data you have ingested, go to the Web UI for your instance.

Getting Credentials

If you need help getting the proper credentials for your bucket, the following are some hints which can help for users with standard configurations

From the console, first type in 'IAM' to the search box in the top:

Locate IAM

Select 'IAM', and find 'Users' in the menu on the left.

Locate Users in IAM

Once you select 'Users', find the user that you will use to access the data. Use search if you have many users. Click on the link in the 'User name' column.

Select User

Once you are in the page for the user, click on the tab on the right content side that says 'Security Credentials'

Find Security Credentials

Now scroll down in the right content side until you see a section labled 'Access keys'. Choose 'Create Access Key'.

Select Create Access Key

Now, choose 'Application Running Outside AWS' and click on 'next'

Select Create Access Key

Choose a name that will mean something to you, and click 'Create access key'

Name Access Key

Now retrieve the information to your access key, either by selecting the copy for the access key and secret key, or by downloading a csv.

Name Access Key

Once you no longer need your key, delete it.

Delete Access Key

Setting Permissions

Permission management is critical for allowing access to your data, and we strive to use minimal permissions for our workflows.

Giving a user ReadOnlyAccess is a simple way to provide adequate access.

The minimal access required follows:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::YOUR_BUCKET_NAME",
"arn:aws:s3:::YOUR_BUCKET_NAME/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "*"
}
]
}

ListAllMyBuckets is used to verify that the account credentials have been supplied correctly and to aid in detecting misconfiguration of bucket names or permissions.