Skip to main content

BlobNewestDataCSV

BlobNewestDataCSV Objects

class BlobNewestDataCSV(CSVParser.CSVParser)

ApertureDB General CSV Parser for Maintaining a Blob set with changing blob data.

Use this only when deleting entities is ok.

Update an Entity which has an associated blob to the data in the CSV What this means is:

  • If it doesn't exist, add it.

  • If it exists and the blob hasn't changed, update it.

  • If it exists and the blob has changed, delete and re-add it.

    This means if these elements are part of a graph where they are linked by connections these connections will be need to be regenerated afterwards.

    Additionally note that this loader does not notice if a entity is removed.

  • If you wish to clean up removed entities, an increasing load id will cause all existing entities to be updated to the newest, and you can delete using the old load id.

    This class utilizes 3 conditionals

  • normal constraint_ to select the element

  • a series of updateif_ to determine if an update is necessary

  • one or more prop_ and the associated updateif blob conditionals to determine if a update or an delete/add is appropriate

    Generated fields Format is: gen_[type]_name

    type:

  • blobsha1 - the sha1 for the blob is calculated

  • blobsize - the length in bytes of the blob is calculate

  • insertdate - ISO Format of date ( this will always change! )

    the result is then used to identify if a blob has changed.

    The generated fields are to be left empty in the CSV input.

    Summary This requires a constraint to be able to check if an id exists, and a generated prop to be able to detect if the blob matches. It will ensure only one entity exists with the constraints and matching the managed blob constraints.

note

Is backed by a CSV file with the following columns (format optional):

``filename``, ``PROP_NAME_1``, ... ``PROP_NAME_N``, ``constraint_PROP1``, ``format``

OR

``url``, ``PROP_NAME_1``, ... ``PROP_NAME_N``, ``constraint_PROP1``, ``format``

OR

``s3_url``, ``PROP_NAME_1``, ... ``PROP_NAME_N``, ``constraint_PROP1``, ``format``

OR

``gs_url``, ``PROP_NAME_1``, ... ``PROP_NAME_N``, ``constraint_PROP1``, ``format``
...

Example CSV file::

filename,id,label,constraint_id,format,dataset_ver,updateif>_dataset_ver,gen_blobsha1_sha
/home/user/file1.jpg,321423532,dog,321423532,jpg,2,2,
/home/user/file2.jpg,42342522,cat,42342522,png,2,2,
...

Example usage:


data = ImageForceNewestDataCSV("/path/to/WorkingImageDataset.csv")
loader = ParallelLoader(client)
loader.ingest(data)
info

In the above example, the constraint_id ensures that an Image with the specified id would be only inserted if it does not already exist in the database.