Skip to main content

Interact with PyTorch Objects

This notebook illustrates a few ways in which you can work with PyTorch and ApertureDB.

Prerequisites:

  • Access to an ApertureDB instance.
  • aperturedb-python installed. (note that pytorch gets pulled in as dependency of aperturedb)
  • COCO dataset files downloaded. We will use the validation set in the following cells.

Install ApertureDB SDK and download COCO dataset

%pip install aperturedb[complete]

!mkdir coco
!cd coco && wget http://images.cocodataset.org/zips/val2017.zip && unzip val2017.zip && wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip && unzip annotations_trainval2017.zip
# Define some common variables such has ApertureDB host, port, user, password
import dbinfo

Steps

Load PyTorch dataset into ApertureDB

This step uses a PyTorch CocoDetection dataset, and ingests it into ApertureDB. To handle the semantics for ApertureDB, a class CocoDataPyTorch is implemented. It uses aperturedb.PytorchData as a base class, and implements a method generate_query which translates the data as it is represented in CocoDetection (a PyTorch dataset object) into the corresponding queries for ApertureDB.


from aperturedb.ParallelLoader import ParallelLoader
from CocoDataPyTorch import CocoDataPyTorch


loader = ParallelLoader(dbinfo.create_connector())

coco_detection = CocoDataPyTorch("coco_validation_with_annotations")

# Lets use 100 images from CocoDataPyTorch object which have annotations for the purpose of the demo
# Ingesting all of them might be time consuming
images = []
for t in coco_detection:
X, y = t
if len(y) > 0:
images.append(t)
if len(images) == 100:
break

loader.ingest(images, stats=True)

loading annotations into memory...
Done (t=0.26s)
creating index...
index created!
Progress: 100.00% - ETA(s): 0.00
============ ApertureDB Loader Stats ============
Total time (s): 0.285294771194458
Total queries executed: 100
Avg Query time (s): 0.05476363658905029
Query time std: 0.02537986835791064
Avg Query Throughput (q/s)): 73.04116835805202
Overall insertion throughput (element/s): 350.51466096390396
Total inserted elements: 100
=================================================

Note that we ingested first 100 images which were annotated.

Inspect a sample of the data that has been ingested into ApertureDB

from aperturedb import Images
import pandas as pd

images = Images.Images(dbinfo.create_connector())
constraints = Images.Constraints()
constraints.equal("dataset_name", "coco_validation_with_annotations")

images.search(limit=5, constraints=constraints)

images.display(show_bboxes=True)

pd.json_normalize(images.get_properties(images.get_props_names()).values())
areabboxcategory_iddataset_nameidimage_idiscrowdkeypointsnum_keypointssegmentation
02913.11040412.8 157.61 53.05 138.011coco_validation_with_annotations2308311390427 170 1 429 169 2 0 0 0 434 168 2 0 0 0 441 ...15[428.19, 219.47, 430.94, 209.57, 430.39, 210.1...
127789.11055280.79 44.73 218.7 346.681coco_validation_with_annotations4426197850367 81 2 374 73 2 360 75 2 386 78 2 356 81 2 3...17[353.37, 67.65, 358.15, 52.37, 362.92, 47.59, ...
225759.04240145.26 100.67 291.95 457.351coco_validation_with_annotations5595088720367 138 2 0 0 0 360 134 2 0 0 0 338 144 2 0 0 ...12[310.65, 112.18, 339.42, 100.67, 362.43, 106.4...
310451.76710277.31 189.99 140.09 208.221coco_validation_with_annotations4391178850374 216 2 379 214 2 370 212 2 0 0 0 359 208 2 ...16[282.11, 262.92, 297.46, 229.33, 312.81, 215.9...
42343.98800214.61 154.48 66.51 129.111coco_validation_with_annotations25632613530260 185 2 0 0 0 255 181 2 0 0 0 237 184 2 253 ...9[225.37, 176, 224.88, 192.63, 214.61, 211.7, 2...

png

png

png

png

png

Use data from ApertureDB in a PyTorch DataLoader

The list of elements/elements that can be queried from ApertureDB can be used as a dataset, which in turn can be used by the PyTorch data loader. The following example uses a subset of data for the same purpose.

from aperturedb import Images
from aperturedb import PyTorchDataset
import time
from IPython.display import Image, display
import cv2


db = dbinfo.create_connector()

query = [{
"FindImage": {
"constraints": {
"dataset_name": ["==", "coco_validation_with_annotations"]
},
"blobs": True
}
}]
dataset = PyTorchDataset.ApertureDBDataset(db, query)
print("Total Images in dataloader:", len(dataset))
start = time.time()
# Iterate over dataset.
for i, img in enumerate(dataset):
if i >= 5:
break
# img[0] is a decoded, cv2 image
converted = cv2.cvtColor(img[0], cv2.COLOR_BGR2RGB)
encoded = cv2.imencode(ext=".jpeg", img=converted)[1]
ipyimage = Image(data=encoded, format="JPEG")
display(ipyimage)

print("Throughput (imgs/s):", len(dataset) / (time.time() - start))
Total Images in dataloader: 58

jpeg

jpeg

jpeg

jpeg

jpeg

Throughput (imgs/s): 315.366108884245

Create a dataset with resized images

More details for writing custom constraints and operations when finding images.

query = [{
"FindImage": {
"constraints": {
"dataset_name": ["==", "coco_validation_with_annotations"]
},
"operations": [
{
"type": "resize",
"width": 224
}
],
"blobs": True
}
}]
dataset = PyTorchDataset.ApertureDBDataset(db, query)
print("Total Images in dataloader:", len(dataset))
start = time.time()
# Iterate over dataset.
for i, img in enumerate(dataset):
if i >= 5:
break
# img[0] is a decoded, cv2 image
converted = cv2.cvtColor(img[0], cv2.COLOR_BGR2RGB)
encoded = cv2.imencode(ext=".jpeg", img=converted)[1]
ipyimage = Image(data=encoded, format="JPEG")
display(ipyimage)

print("Throughput (imgs/s):", len(dataset) / (time.time() - start))
Total Images in dataloader: 58

jpeg

jpeg

jpeg

jpeg

jpeg

Throughput (imgs/s): 292.91137271333844

Create a DataLoader from the dataset to use in other PyTorch methods.

from torch.utils.data import DataLoader

dl = DataLoader(dataset=dataset)
# dl is a torch.utils.data.DataLoader object, which can be used in PyTorch
# https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

len(dl)
58

The PyTorch DataLoader (dl) has all the interfaces to batch, shuffle and multiprocess this dataset in the remainder of the pipeline