Skip to main content

Ingest From a croissant URL Workflow

This workflow allows you to ingest datasets described through a croissant URL into ApertureDB.

This lets you use your own existing data, or a related dataset that might be available on sites such as HuggingFace, kaggle, and Google datasets. This provides an easy way to get started with ApertureDB, and to see how it can be used with real data.

This video demonstrates how to import ML Croissant data into ApertureDB

Creating the workflow

Creating and deleting workflows
For general information on creating workflows in ApertureDB Cloud see Creating and Deleting Workflows.
[object Object]
1
2
  1. Enter the public URL of an ML Croissant dataset, e.g. MNIT-CoT-Dataset on huggingface, text-2-video-human-preferences-veo3 on huggingface.
  2. Click the blue button at the bottom.
See the GitHub repository for more information
For more detailed information about what this workflow is doing, additional information about the parameters, and how to run the workflow outside of the ApertureDB Cloud, see the dataset-ingestion documentation in GitHub.
  • Huggingface Dataset
    • Huggingface croissant link
  • Kaggle Dataset Kaggle croissant link

See the results

If you go to the "My Instances" page and click on "Connect" for the instance you used, you will see an option to go to the Web UI for your instance. You will see the number of objects in the database increase as the workflow runs. Click on the refresh button to update the count.