Skip to main content

Ingest Cookbook (CSVParser)

Open In Colab Download View source on GitHub

Following the concepts on the different means of ingesting the data, we will build an example using the CSV Parser method in this notebook.

We will use the Cookbook dataset to have the data persisted onto ApertureDB instance.

Additional scripts used.

  • convert_ingredients_adb_csv.py: The script converts the first 3 sheets on the source into 3 CSV files which are understood by the CSV Parsers in the SDK. For a more in-depth understanding of the various CSV Parsers, refer to this page.

Connect to the dataabse.

  • If you haven't already setup the database or configured it, check out our quick start guide
# Install the required packages
%pip install --upgrade --quiet pip
%pip install --upgrade --quiet aperturedb

Prepare Input Data

# Get the script to generate the data.json
!wget https://github.com/aperture-data/Cookbook/raw/refs/heads/main/scripts/convert_ingredients_adb_csv.py

# Run the script to generate the right CSV files
!python convert_ingredients_adb_csv.py

Create objects of these classes.

We will provision the data using 3 csv files, prepared from script

Example lines from the CSV files:

dishes.adb.csv

url,id,contributor,name,type,location,cuisine,caption,Recipe URL,constraint_id
https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/001_Large.jpeg,1,gautam,rajma chawal,main dish,NJ,Indian,>Beans with rice,https://www.tarladalal.com/rajma-chawal-punjabi-rajma-chawal-4951r,1
https://raw.githubusercontent.com/aperture-data/Cookbook/refs/heads/main/images/002_Large.jpeg,2,gautam,paneer bhurji,main dish,NJ,Indian,>"Scrambled cottage cheese with finely chopped onion, bell pepper and tomatoes",https://www.indianhealthyrecipes.com/paneer-bhurji-recipe/,2

ingredients.adb.csv

EntityClass,UUID,Name,other_names,category,subgroup,macronutrient,micronutrient,constraint_UUID
Ingredient,8ccd94efe6ac436f8c9f4b180677344a,all-purpose flour,maida,vegetarian,refined grains,carbohydrates,,8ccd94efe6ac436f8c9f4b180677344a
Ingredient,3d1ef186f6c14e61a67b01f2abfef6c4,apple,,vegetarian,fruit,carbohydrates,,3d1ef186f6c14e61a67b01f2abfef6c4

dish_ingredients.adb.csv

ConnectionClass,_Image@id,Ingredient@UUID
HasIngredient,1,328ade6ff21244bd92332704ef72bda9
HasIngredient,1,45663216d9ea4262a608b9067adc5d1f

Ingesting using the CSV Parsers.

Let's ingest the same data. We use the pre converted CSV files as input.

Dig deeper!

Learn more about:

from aperturedb.ImageDataCSV import ImageDataCSV
from aperturedb.EntityDataCSV import EntityDataCSV
from aperturedb.ConnectionDataCSV import ConnectionDataCSV
from aperturedb.CommonLibrary import create_connector, execute_query
from tqdm.auto import tqdm

dishes_objects = ImageDataCSV("dishes.adb.csv")
ingredients_objects = EntityDataCSV("ingredients.adb.csv")
connection_objects = ConnectionDataCSV("dish_ingredients.adb.csv")

pbar= tqdm(total=len(dishes_objects) + len(ingredients_objects) + len(connection_objects))
client = create_connector()

for objects in [dishes_objects, ingredients_objects, connection_objects]:
for query, blobs in objects:
result, response, output_blobs = execute_query(client, query, blobs)

if result != 0:
print(response, query)
break
pbar.update(1)

Load CSVs via adb CLI

Assuming that the input files were generated in the first step of the Notebook, you can use the adb tool to ingest the data as follows:

adb ingest from-csv dishes.adb.csv --ingest-type IMAGE
adb ingest from-csv ingredients.adb.csv --ingest-type ENTITY
adb ingest from-csv dish_ingredients.adb.csv --ingest-type CONNECTION