Skip to main content

KaggleData

KaggleData Objects

class KaggleData(Subscriptable)

Class to wrap around a Dataset retrieved from kaggle

A DataSet downloaded from kaggle does not implement a standard mechanism to iterate over its values This class intends to provide an abstraction like that of a pytorch dataset where the iteration over Dataset elements yields an atomic record.

note

This class should be subclassed with specific implementations of generate_index and generate_query.

Example subclass: CelebADataKaggle

Arguments:

generate_index

def generate_index(root: str, records_count: int = -1) -> pd.DataFrame

Generate a way to access each record downloaded at the root

Arguments:

  • root str - Path to wich kaggle downloads a Dataset.

Returns:

  • pd.DataFrame - The Data loaded in a dataframe.

generate_query

def generate_query(idx: int) -> Tuple[List[dict], List[bytes]]

Takes information from one atomic record from the Data and converts it to Query for apertureDB

Arguments:

  • idx int - index of the record in collection.

Raises:

  • Exception - description

Returns:

Tuple[List[dict], List[bytes]]: A pair of list of commands and optional list of blobs to go with them.