KaggleData
KaggleData Objects
class KaggleData(Subscriptable)
Class to wrap around a Dataset retrieved from kaggle
A DataSet downloaded from kaggle does not implement a standard mechanism to iterate over its values This class intends to provide an abstraction like that of a pytorch dataset where the iteration over Dataset elements yields an atomic record.
note
This class should be subclassed with specific implementations of generate_index and generate_query.
Example subclass: CelebADataKaggle
Arguments:
dataset_ref
str - URL of kaggle dataset, for example https://www.kaggle.com/datasets/jessicali9530/celeba-datasetrecords_count
int - number of records to provide to generate.
generate_index
def generate_index(root: str, records_count: int = -1) -> pd.DataFrame
Generate a way to access each record downloaded at the root
Arguments:
root
str - Path to wich kaggle downloads a Dataset.
Returns:
pd.DataFrame
- The Data loaded in a dataframe.
generate_query
def generate_query(idx: int) -> Tuple[List[dict], List[bytes]]
Takes information from one atomic record from the Data and converts it to Query for apertureDB
Arguments:
idx
int - index of the record in collection.
Raises:
Exception
- description
Returns:
Tuple[List[dict], List[bytes]]: A pair of list of commands and optional list of blobs to go with them.