KaggleData

KaggleData Objects

class KaggleData(Subscriptable)

Class to wrap around a Dataset retrieved from kaggle

A DataSet downloaded from kaggle does not implement a standard mechanism to iterate over its values This class intends to provide an abstraction like that of a pytorch dataset where the iteration over Dataset elements yields an atomic record.

note

This class should be subclassed with specific implementations of generate_index and generate_query.

Example subclass: CelebADataKaggle

Arguments:

dataset_ref str - URL of kaggle dataset, for example https://www.kaggle.com/datasets/jessicali9530/celeba-dataset
records_count int - number of records to provide to generate.

generate_index

def generate_index(root: str, records_count: int = -1) -> pd.DataFrame

Generate a way to access each record downloaded at the root

Arguments:

root str - Path to wich kaggle downloads a Dataset.

Returns:

pd.DataFrame - The Data loaded in a dataframe.

generate_query

def generate_query(idx: int) -> Tuple[List[dict], List[bytes]]

Takes information from one atomic record from the Data and converts it to Query for apertureDB

Arguments:

idx int - index of the record in collection.

Raises:

Exception - description

Returns:

Tuple[List[dict], List[bytes]]: A pair of list of commands and optional list of blobs to go with them.

KaggleData Objects​

generate_index​

generate_query​

KaggleData Objects

generate_index

generate_query