Skip to main content

Labeling

Labeling is a common task in supervised machine learning involving multiple media types and many types of annotations. The process leads to even more information that needs to be stored with its corresponding media. Most labeling task are still very human driven because of the requirements for accuracy. For more detail read about Data Labeling and Annotation.

Because it has so many moving parts, keeping all the items organized, sychronized and searchable can be hard. ApertureDB makes this easy.

General How-to

Before diving in, it is useful to decide how you want your pipeline to work, first identify where in your pipeline ApertureDB will reside. Given the design of ApertureDB, it can serve as either a source or a destination for pipeline data or both. Once that is determined, then you can create connectors between the various components.

All labeling tools will require import of items to be labeled. Some will want the resources local to the labeling tool but it is common to have those served from a remote location to help avoid data silos.

Data can be imported to ApertureDB using our Python data loaders or a rest interface, depending on what method best suits the existing infrastructure and labeling requirements. A push based system in general will work best with a rest interface, and a pull based will favor a python data loader.

After you have imported your data to ApertureDB then you can create annotations task for your media in the the labeling system. Creating the annotation tasks is often very specific to the tool. The important part is ensuring whatever metadata the system will exports is enough for you to identify the media the annotations belong to. Naming media with a guid is a good way to ensure easy mapping, because often a labeling system will have internal ids which are generated by how it loads tasks in. You will then feed the media either through a URL or the binary data into the tool. ApertureDB can automate this by serving the images and using metadata to keep track of what media hasn't been imported into the tool.

The last step is getting labeled data from the tool. This again is very specifc to the tool, but data will be available in a structured format, which then is a simple task of mapping into the format in ApertureDB. Depending on the requirements and tool, data can be exported on a schedule or via a trigger from the tool.

Working with Label Studio

Here we discuss examples and information relevant to specific labeling tools. This serves as an example for any other tool you might be using. Please reach out to us (team@aperturedata.io) for specific details.

A basic labeling which imports images and produces labels will be the example used here; More complex mappings are supported, but are beyond the scope of this document.

Some familiarity with the Label Studio API is helpful, specifically the API for importing tasks and exporting labeling.

The task importing documentation can be a little confusing because the endpoint is overloaded; The easiest method for import is a file which lists just the resources to be imported, and contains no extra information. Extra information can of course be included by using a CSV file with columns that are mapped into the labeling UI by the label config and in particular, variables.

With Label Studio, when importing a list of images it is important to note that the tool will simply store those urls and then attempt to load them by JavaScript in the client. This means that setting up CORS appropriately is important. Our example shows how to simply satisify this � basically add a header of Access-Control-Allow-Origin: *. An open policy like this is perfect for testing, but security should be tightening down before production.

Since the Label Studio UI is web based, it expects media to be served over http(s); our Label Studio example serves the images with a simple Python script and generates the list of all the tasks to be added from this. You can serve these from a dedicated server like nginx, or you can serve them from within ApertureDB and use a simple REST gateway to automatically generate the list, and serve files.

Next you must import the list of tasks into LS. Our example shows how to use the API to load a single URL. You can easily change this to a CSV file if you need extra data. After loading the tasks, the next piece is exporting the labeling. Our example does a single full export without batching, which may not work if you have a lot of data to export. For running an export on a schedule, you might want to ensure only data which changed since last run was exported.

ApertureDB's Label Studio Integration

ApertureDB now offers seamless integration with Label Studio, expanding your storage options for both importing raw data and exporting labeled data. This integration empowers you to connect Label Studio directly to ApertureDB, accelerating your data labeling workflows and optimizing your machine learning pipelines.

Key Features

Import Data from ApertureDB:

  • Effortlessly pull data directly from your ApertureDB instances for labeling in Label Studio.
  • Generate Label Studio tasks and predictions directly from ApertureDB images and bounding boxes.
  • Utilize ApertureDB's efficient query and filtering capabilities to select specific data subsets for labeling.

Export Annotations to ApertureDB:

  • Seamlessly store labeled data back into ApertureDB for further analysis, model training, and integration with other tools.
  • Maintain data consistency and accessibility within your ApertureDB ecosystem.
  • Keep all of your Label Studio data in ApertureDB.

Configuring a Label Studio project with ApertureDB storage

Prerequisites

  • An active ApertureDB instance

Setup Guide

Install the Integration:

  • Download the integrates Label Studio package from github.
  • Follow the specific installation instructions provided within the package.

Configure ApertureDB as Input Storage:

  • Access the "Storage" section in your Label Studio project settings.
  • Add a new Input Storage, selecting "ApertureDB" as the type.
  • Provide your ApertureDB connection details, including:
    • ApertureDB hostname and port
    • ApertureDB authentication username and either password or token
      • Note: the user must have read access to ApertureDB.
  • Optionally add constraints to filter the ApertureDB images to consider.
  • Optionally include ApertureDB bounding boxes as predictions, with or without constraints.
  • Check the connection to verify that that is it properly configured.
  • Save the storage.
  • Sync storage.

In the background, Label Studio will query ApertureDB for images and bounding boxes that it will use to automatically generate annotation tasks.

Configure ApertureDB as Export Storage:

  • Access the "Storage" section in your Label Studio project settings.
  • Add a new Export Storage, selecting "ApertureDB" as the type.
  • Provide your ApertureDB connection details, including:
    • ApertureDB hostname and port
    • ApertureDB authentication username and either password or token
      • Note: the user must have write access to ApertureDB nodes.
  • Check the connection to verify that that is it properly configured.
  • Save the storage.

As annotation tasks are completed, the annotation data and bounding boxes will be saved back to ApertureDB.