Ingest Cookbook (DataModels)
Following the concepts on the different means of ingesting the data, we will build an example using the DataModel method in this notebook.
We will use the Cookbook dataset to have the data persisted onto ApertureDB instance.
Additional scripts used.
create_nested_json.py: The script merges the first 3 sheets on the source into a json file such that there will be a list of Dishes objects, and each Dish may have multiple ingredients, and each ingredient has miscellaneous properties. This ends up as a json file.
Connect to the dataabse.
- If you haven't already setup the database or configured it, check out our quick start guide
# Install the required packages
%pip install --upgrade --quiet pip
%pip install --upgrade --quiet aperturedb
Prepare Input Data.
# Get the script to generate the data.json
!wget https://github.com/aperture-data/Cookbook/raw/refs/heads/main/scripts/create_nested_json.py
# Run the script to generate the input file data.json
!python create_nested_json.py
Data Model Definitions
A popular way to define the schema in python is using pydantic, and we shall use the same to create the associations of our Cookbook.
from typing import List, Optional
from aperturedb.DataModels import ImageDataModel, IdentityDataModel
class Ingredient(IdentityDataModel):
Name: str
other_names: Optional[str] = ""
macronutrient: Optional[str] = ""
micronutrient: Optional[str] = ""
subgroup: Optional[str] = ""
category: Optional[str] = ""
class Dish(ImageDataModel):
contributor: str
name: str
location: str
caption: str
recipe_url: str
cuisine: str
dish_id: int
ingredients: List[Ingredient]
Create Objects Based on Data Models
A JSON file generated by running the script create_nested_json.py describes the objects to be ingested
Example line from the json file:
Sample record in dishes.json
{
"dish_id": 1,
"url": "https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/001 Large.jpeg",
"type": "main dish",
"location": "NJ",
"cuisine": "Indian",
"recipe_url": "https://www.tarladalal.com/rajma-chawal-punjabi-rajma-chawal-4951r",
"contributor": "gautam",
"caption": "Beans with rice",
"name": "rajma chawal",
"ingredients": [
{
"Name": "red kidney beans",
"other_names": "rajma",
"category": "vegetarian",
"subgroup": "legume",
"macronutrient": "protein"
},
{
"Name": "rice",
"other_names": "chawal",
"category": "vegetarian",
"subgroup": "grain",
"macronutrient": "carbohydrates"
}
]
}
These objects can be passed to a function called generate_add_query which takes care of generating the queries that ApertureDB can execute to persist the objects on the DB.
Generate and execute queries.
from aperturedb.Query import generate_add_query
from aperturedb.CommonLibrary import execute_query, create_connector
import json
from tqdm.auto import tqdm
with open("dishes.json") as ins:
client = create_connector()
dishes = json.load(ins)
for dish in tqdm(dishes):
# Create the Dish object, along with it's ingredients
dish = Dish(**dish)
# Create a query to be run against the database
query, blobs, _ = generate_add_query(dish)
# Execute the query. The client has been setup.
result, response, output_blobs = execute_query(client, query, blobs)
if result != 0:
print(response)
break