Skip to main content

Frequently Asked Questions

ApertureData offers a purpose-built database, ApertureDB, for multimodal data such as images, videos, feature vectors, and associated metadata including annotations.

Getting Started

Q: What are some technical use cases leveraging ApertureDB in production?
A: We have been working with customers across a variety of verticals and initiatives. For example, we have smart retail customers using ApertureDB for annotation management and nearest neighbor search. Across these verticals, there are some common technical benefits that our customers love ApertureDB for. Those are described in our documentation.

Q: Why should we work with a startup instead of some of the more established database vendors?
A: ApertureDB is a purpose-built database for visual analytics, designed from the ground up and optimized to handle all of the data needs of complex visual ML workloads. Our documentation further outlines the architectural and design benefits of such a unified backend. In addition to the fundamental design: 1) We have developed this database using state of the art tools for software development which helped us focus on user friendliness, security, ease of validation, ease of deployment as well as updates. Traditional databases struggle to offer this and users have to work that much harder to simplify their lives, which can be debilitating when the AI world is moving so fast. 2) We see the entire ML pipeline as our workload. So we are exposed to the tools that work or the ones that don't, the different ways in which pipelines can be setup, the challenges, and we bring all that knowledge when we work with our users. 3) We are here to ensure that our customers succeed and that's our primary driving philosophy. Our customer requests rarely go unanswered for more than 2 hours on their dedicated slack channel. We do regular cadence calls to not miss a beat!

Q: What support does the ApertureData team offer when working with customers?
A: We work together with our users to install, configure, and manage the database, define their application schema, help ingest their existing data, and provide sample queries. We provide more detailed diagrams, documentation for maintaining the database, and help create integrations with other components of their ML pipelines e.g. with PyTorch or a labeling tool. We host hands-on tutorials to help teams get comfortable with our API and tools, weekly Q&A sessions, and regular cadence calls to stay in sync with customer priorities and roadmap. We can be reached on our slack channel (please ask to join aperturedata.slack.com) and via email (support@aperturedata.io) for any questions.

Q: I have all my data accessible from a cloud bucket. Why do I need ApertureDB?
A: The ability to search, efficiently access, process, and visualize data is paramount for the success of AI deployments. When it comes to unstructured data, specifically images, videos, or even documents, just knowing filenames often isn't enough. You can search them via different modalities like metadata, labels, embeddings, and the preprocessing requires complex libraries like ffmpeg or opencv. ApertureDB allows all of the relevant data to be colocated for efficient retrieval, and complex queries to be handled transactionally. ApertureDB is purpose-built to hide that complexity and offer a simple interface to our target users so they can go back to focusing on analytics instead of spending weeks or months on data preparation. Complex data contains rich insights but is only valuable if timely, and that requires the right tools!

Q: Where does ApertureDB fit in this world of vector databases?
A: Vector databases have been used for visual recommendations and classification for quite some time. They have become especially popular due to LLMs and semantic search. ApertureDB is a full-featured vector database with k-NN query performance rivaling any other available product. But vector embeddings are just one type of data that the applications need, as we explain in this blog. Metadata, labels, and original files are some of the other data types that are equally important to access. This frequently leads to a complex implementation that needs to stitch together an ad hoc combination of a metadata store, a cloud object store, a vector database, and some processing libraries. This is largely inefficient and heavily prone to errors. With ApertureDB, we offer a database designed for data such as images, videos, documents, embeddings, and associated metadata including annotations. Our biggest value proposition is that we integrate functionalities of a vector database, intelligence graph, and visual or document data to seamlessly query across data domains.

Q: Our data management is messy right now but we don't have a lot of time or resources to fix it. How long does customer onboarding take?
A: We can setup a hosted environment for your data within a few minutes. If you would like to install ApertureDB in your virtual private cloud, we can provide you access to our Dockerhub to get started. While data ingestion is usually a matter of a day or a few days, the overall time to onboard will depend on how quickly we can start working with you on schema definition and data ingestion. In short, as soon as we figure out our engagement details, we function as your extended team when onboarding and it should take no more than a week or two.

Q: What is the current engagement or pricing model for ApertureDB?
A: Our pricing is based on annual or multi-year subscription based on the number of database instances, storage tier, and support level. We do not charge per user or per number of objects managed by ApertureDB, making it extremely cost-effective. We offer few different options to try out and onboard our customers. Write to us to learn more.

Application Development

Q: How do users connect with ApertureDB instances?
A: You can access the data directly using our REST API, or integrate with one of the client libraries we provide. We provide ApertureDB client packages to work with C++ or Python applications. Our Python package is available to download via PyPi. For adding data programmatically as it is generated, we have a JSON API, described in our documentation.

Q: How long does it takes to ingest existing data?
A: While it depends on how the existing data is organized in different tables and storage as well as the encodings of visual data, our loaders have, in some customer use cases, loaded about 8 million images with metadata in under 3 hours, around 5 million descriptors in about 2.5 hours, and about 5 million segmentation polygons in 0.5 hours.

Q: What's the largest scale of data you have tested ApertureDB on a single virtual or physical machine instance?
A: While the specific numbers depend on how large each object gets, the workload itself, and the resources provisioned to ApertureDB instance, we have so far scaled ApertureDB to over 1.3 billion entities and over 300 million images on a Biotech AI workload. There is another, more detailed study presented in our recently published VLDB paper using the open-source precursor of ApertureDB (VDMS). ApertureDB is already a significant improvement over VDMS and those numbers are forthcoming.

Q: What ML tools and technologies do you integrate with?
A: ML pipelines very often involve multiple technologies like tools for data curation, data labeling, ML training / classification, business queries and visualization among a few. Our key observation has been that, no matter where you are in the pipeline, you are in one way or another, interacting with the data / metadata and we can offer a unified and efficient way to interact with it regardless of the stage. As of today, ApertureDB offers integrations with PyTorch, REST-based frontends, labeling tool like Label Studio, cloud object stores like Amazon S3, visualization with Voxel 51 in addition to our web UI, and others. Please find some examples in our documentation.

Q: I was interested in bulk metadata and bulk object retrieval to launch large scale training. Does ApertureDB have a way to simplify and speed this process for me?
A: We currently provide a "batching" API. This allows multiple workers to retrieve in bulk, portions of large responses. This API was designed for bulk analytics. In fact, our dataset loaders for ML frameworks like PyTorch rely on this API and use it to fetch batches of images or videos pipelining it with epochs.

Q: Is the product meant for both operational and analytical usage? If yes, then what is the impact of bulk operations and how do they affect online access?
A: The product is meant for both operation and analytical usage. The server is fully concurrent, and can support both types of operations at the same time. There may be an increased latency in the responses if both operations are touching the same data, but only if at least one of the operations is performing writes. In this sense, you can expect the same behavior as in any other fully concurrent database.

Q: Does the system require professional/manual fine-tuning expertise to keep it running efficiently with growing data?
A: No, the main fine-tuning needed is in the creation of indexes, which is fully documented. It is a manual tuning, but is usually a standard practice for databases. For the rest, we keep scaling without our users having to worry about the system underneath.

ApertureDB Deployment

Q: What do we deploy?
A: We have packaged ApertureDB and all the dependencies in a collection of docker images that can be pulled from our DockerHub or cloud specific container registry, when ready. In addition to ApertureDB server, we deploy docker images containing our web frontend and monitoring dashboards through Prometheus / Grafana. We can also provide a docker with Jupyter Notebooks that are great to try out the API. Our Python client package is available to install via pip. We can provide examples of using our data loaders and sample queries as needed. More information can be found in our documentation. If you prefer, we can host ApertureDB in our cloud account, with your choice of cloud provider and regions, and manage all the instances for you. Our distributed database can be deployed via Kubernetes. We can work with you to configure the cluster using Terraform, Helm, or any other preferred method in your organization.

Q: Can we install ApertureDB in AWS or GCP?
A: Yes. ApertureDB is cloud agnostic and can run on any cloud provider.

Q: How do updates work?
A: For single machine updates, we just push a new docker image that you can pull and run. We can push the new version to a cloud registry of your choice, or you can pull directly from our registries. If we manage your instances, then this process will be automatic. We are working on incremental updates with our Kubernetes instantiation to avoid any downtime.

Q: Can my team write Kubernetes manifests to execute the Docker image as a StatefulSet with persistent disk image and the required hardware dedicated to it (CPU Cores and Memory)?
A: Yes. Most certainly

Q: How can I monitor my ApertureDB deployment?
A: ApertureDB integrates with Prometheus and Grafana for telemetry, visibility, and log aggregation. Please contact us for a demo.

Security and Compliance

Q: How does ApertureDB support authentication and access control?
A: We support password / token-based authentication, encrypted communication between our Python / C++ clients and ApertureDB server as well as role based access control (RBAC). Users and roles can be managed from our web UI.

Q: Does an ApertureDB instance in a customer VPC communicate with ApertureData team?
A: No, it does not.

Q: Which resource and service access (read/write/both) will you need and for what purpose?
A: We don’t need any resource service/access other than accessing the VM where ApertureDB is running if some debugging is needed. We do not transfer any data to our cloud account. It stays within your VPC. If we host your instances, you can still manage your own logins and passwords and choose your own cloud encrypted disks.

Q: What open-source technologies are used as the basis for the solution?
A: The main open-source technologies are OpenCV, Faiss, LMDB, and ffmpeg. We started our journey with Intel's VDMS and PMGD open-source projects, which are the basis of our server-side implementation. Both the systems have now been rewritten since then, to support production requirements. In addition, we use OpenSSL to support encrypted communication between ApertureDB client and server. We have built a monitoring tool using Prometheus and Grafana.