Skip to main content

Ingest Data and Metadata

ApertureDB allows users to store metadata as well as data like images, videos, audios, document, or any unstructured types. There are three methods available for data ingestion via Python SDK.

Ultimately, each one of the following methods generate quite a few JSON queries that are then run against a specified ApertureDB instance.

Let's go over those scenarios with some subtle nuances for each one of them. They are ordered by a decreasing amount of familiarity with ApertureDB.

Ingestion based on the data model

If the schema can be expressed in terms of a Pydantic model, one possibility is to use that existing schema and use a different base class or as a mixin.

Pros:

  • It may be the quickest option to get started.

Cons:

  • Is the least tested as of now. Some things might not work out of the box.

Ingestion using data from CSV files

This is the most thoroughly tested way of getting things into ApertureDB. There is a scpecific format corresponding to each of the types of objects ApertureDB natively understands.

These are the currently supported implementations of the Parsers:

ADB Object typeCSV format to be used
BLOBBlobDataCSV
BOUNDING_BOXBBoxDataCSV
CONNECTIONConnectionDataCSV
DESCRIPTORDescriptorDataCSV
DESCRIPTORSETDescriptorSetDataCSV
ENTITYEntityDataCSV
IMAGEImageDataCSV
POLYGONPolygonDataCSV
VIDEOVideoDataCSV

Pros:

  • Most reliable
  • Only a understanding of a CSV format is required.

Cons:

  • This introduces a need to generate a CSV as an intermediate step.
  • User would need to flatten their data even though it might be hierarchical.

Custom defined data generators

These are the most free form generators that are supported. These are just subclasses of Subscriptable, and they implement a very simple interface. But the flexibility that they offer make them suitable for plugging arbitrary sources to ApertureDB, performing bespoke customizations.

Example implementations:

Pros:

  • Most flexible in terms of the queries that are generated.
  • Can use it to plug arbitrary sources containing data.

Cons:

  • Needs a good understanding of ApertureDB's query language.
tip

ApertureDB command line tool, adb, provides subcommands to ingest data using CSV files or with data generators.

Ingest data using JSON commands

The JSON-based native query commands provide Add and Update methods for all the supported Object types in ApertureDB. These can be used from any of our Python / C++ clients, Jupyter notebooks, or even our Web UI (custom query tab) to ingest small amounts of data. However, it would require a lot of work to achieve a high ingestion throughput, and handle all the response types, which is why we offer the methods above as well as our ParallelQuery and generator option defined above.

Pros:

  • Most flexible in terms of the queries that are generated.

Cons:

  • Needs a good understanding of ApertureDB's query language.

Matrix of choices vs implications

OptionQL familiarityDataCSV familiarityMaturity
Modellowlowlow
CSV Parsermediumhighhigh
Query Generatorhighlowmedium
JSON Commandshighlowhigh
tip

Talk to us about cron jobs or Airflow ingestion pipelines for setting up periodic data loading or updates.