Interact with PyTorch Objects

This notebook illustrates a few ways in which you can work with PyTorch and ApertureDB.

Prerequisites:

Access to an ApertureDB instance.
aperturedb-python installed. (note that pytorch gets pulled in as dependency of aperturedb)
COCO dataset files downloaded. We will use the validation set in the following cells.

Install ApertureDB SDK and download COCO dataset

%pip install aperturedb[complete]

!mkdir coco
!cd coco && wget http://images.cocodataset.org/zips/val2017.zip && unzip val2017.zip && wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip && unzip annotations_trainval2017.zip

# Define some common variables such has ApertureDB host, port, user, password
import dbinfo

Steps

Load PyTorch dataset into ApertureDB

This step uses a PyTorch CocoDetection dataset, and ingests it into ApertureDB. To handle the semantics for ApertureDB, a class CocoDataPyTorch is implemented. It uses aperturedb.PytorchData as a base class, and implements a method generate_query which translates the data as it is represented in CocoDetection (a PyTorch dataset object) into the corresponding queries for ApertureDB.

from aperturedb.ParallelLoader import ParallelLoader
from CocoDataPyTorch import CocoDataPyTorch


loader = ParallelLoader(dbinfo.create_connector())

coco_detection = CocoDataPyTorch("coco_validation_with_annotations")

# Lets use 100 images from CocoDataPyTorch object which have annotations for the purpose of the demo
# Ingesting all of them might be time consuming
images = []
for t in coco_detection:
        X, y = t
        if len(y) > 0:
            images.append(t)
            if len(images) == 100:
                break

loader.ingest(images, stats=True)

loading annotations into memory...
Done (t=0.26s)
creating index...
index created!
 Progress: 100.00% - ETA(s): 0.00
 ============ ApertureDB Loader Stats ============
Total time (s): 0.285294771194458
Total queries executed: 100
Avg Query time (s): 0.05476363658905029
Query time std: 0.02537986835791064
Avg Query Throughput (q/s)): 73.04116835805202
Overall insertion throughput (element/s): 350.51466096390396
Total inserted elements: 100
=================================================

Note that we ingested first 100 images which were annotated.

Inspect a sample of the data that has been ingested into ApertureDB

from aperturedb import Images
import pandas as pd

images = Images.Images(dbinfo.create_connector())
constraints = Images.Constraints()
constraints.equal("dataset_name", "coco_validation_with_annotations")

images.search(limit=5, constraints=constraints)

images.display(show_bboxes=True)

pd.json_normalize(images.get_properties(images.get_props_names()).values())

	area	bbox	category_id	dataset_name	id	image_id	keypoints	num_keypoints	segmentation
0	2913.11040	412.8 157.61 53.05 138.01	1	coco_validation_with_annotations	230831	139	427 170 1 429 169 2 0 0 0 434 168 2 0 0 0 441 ...	15	[428.19, 219.47, 430.94, 209.57, 430.39, 210.1...
1	27789.11055	280.79 44.73 218.7 346.68	1	coco_validation_with_annotations	442619	785	367 81 2 374 73 2 360 75 2 386 78 2 356 81 2 3...	17	[353.37, 67.65, 358.15, 52.37, 362.92, 47.59, ...
2	25759.04240	145.26 100.67 291.95 457.35	1	coco_validation_with_annotations	559508	872	367 138 2 0 0 0 360 134 2 0 0 0 338 144 2 0 0 ...	12	[310.65, 112.18, 339.42, 100.67, 362.43, 106.4...
3	10451.76710	277.31 189.99 140.09 208.22	1	coco_validation_with_annotations	439117	885	374 216 2 379 214 2 370 212 2 0 0 0 359 208 2 ...	16	[282.11, 262.92, 297.46, 229.33, 312.81, 215.9...
4	2343.98800	214.61 154.48 66.51 129.11	1	coco_validation_with_annotations	256326	1353	260 185 2 0 0 0 255 181 2 0 0 0 237 184 2 253 ...	9	[225.37, 176, 224.88, 192.63, 214.61, 211.7, 2...

png

Use data from ApertureDB in a PyTorch DataLoader

The list of elements/elements that can be queried from ApertureDB can be used as a dataset, which in turn can be used by the PyTorch data loader. The following example uses a subset of data for the same purpose.

from aperturedb import Images
from aperturedb import PyTorchDataset
import time
from IPython.display import Image, display
import cv2


db = dbinfo.create_connector()

query = [{
    "FindImage": {
        "constraints": {
            "dataset_name": ["==", "coco_validation_with_annotations"]
        },
        "blobs": True
    }
}]
dataset = PyTorchDataset.ApertureDBDataset(db, query)
print("Total Images in dataloader:", len(dataset))
start = time.time()
# Iterate over dataset.
for i, img in enumerate(dataset):
    if i >= 5:
        break
    # img[0] is a decoded, cv2 image
    converted = cv2.cvtColor(img[0], cv2.COLOR_BGR2RGB)
    encoded = cv2.imencode(ext=".jpeg", img=converted)[1]
    ipyimage = Image(data=encoded, format="JPEG")
    display(ipyimage)

print("Throughput (imgs/s):", len(dataset) / (time.time() - start))

Total Images in dataloader: 58

jpeg

Throughput (imgs/s): 315.366108884245

Create a dataset with resized images

More details for writing custom constraints and operations when finding images.

query = [{
    "FindImage": {
        "constraints": {
            "dataset_name": ["==", "coco_validation_with_annotations"]
        },
        "operations": [
            {
                "type": "resize",
                "width": 224
            }
        ],
        "blobs": True
    }
}]
dataset = PyTorchDataset.ApertureDBDataset(db, query)
print("Total Images in dataloader:", len(dataset))
start = time.time()
# Iterate over dataset.
for i, img in enumerate(dataset):
    if i >= 5:
        break
    # img[0] is a decoded, cv2 image
    converted = cv2.cvtColor(img[0], cv2.COLOR_BGR2RGB)
    encoded = cv2.imencode(ext=".jpeg", img=converted)[1]
    ipyimage = Image(data=encoded, format="JPEG")
    display(ipyimage)

print("Throughput (imgs/s):", len(dataset) / (time.time() - start))

Total Images in dataloader: 58

jpeg

Throughput (imgs/s): 292.91137271333844

Create a DataLoader from the dataset to use in other PyTorch methods.

from torch.utils.data import DataLoader

dl = DataLoader(dataset=dataset)
# dl is a torch.utils.data.DataLoader object, which can be used in PyTorch
# https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

len(dl)

The PyTorch DataLoader (dl) has all the interfaces to batch, shuffle and multiprocess this dataset in the remainder of the pipeline

Interact with PyTorch Objects

Prerequisites:​

Install ApertureDB SDK and download COCO dataset​

Steps​

Load PyTorch dataset into ApertureDB​

Note that we ingested first 100 images which were annotated.​

Inspect a sample of the data that has been ingested into ApertureDB​

Use data from ApertureDB in a PyTorch DataLoader​

Create a dataset with resized images​

Create a DataLoader from the dataset to use in other PyTorch methods.​