datasets¶
This module contains scripts to download/process all datasets available in dbcollection.
These scripts are self contained, meaning they can be imported and used to manually setup a dataset.
Constructors: Classes¶
BaseDataset¶
-
class
dbcollection.datasets.
BaseDataset
(data_path, cache_path, extract_data=True, verbose=True)[source]¶ Base class for download/processing a dataset.
Parameters: - data_path (str) – Path to the data directory.
- cache_path (str) – Path to the cache file
- extract_data (bool, optional) – Extracts the downloaded files if they are compacted.
- verbose (bool) – Be verbose
Variables: - data_path (str) – Path to the data directory.
- cache_path (str) – Path to the cache file
- extract_data (bool, optional) – Extracts the downloaded files if they are compacted.
- verbose (bool) – Be verbose
- urls (list) – List of URL links to download.
- keywords (list) – List of keywords.
- tasks (dict) – Dataset’s tasks.
- default_task (str) – Default task name.
-
download
()[source]¶ Download and extract files to disk.
Returns: A list of keywords. Return type: tuple
-
get_task_constructor
(task)[source]¶ Returns the class constructor for the input task.
Parameters: task (str) – Task name. Returns: - str – Task name.
- str – Task’s ending suffix (if any).
- BaseTask – Constructor to process the metadata of a task.
BaseTask¶
-
class
dbcollection.datasets.
BaseTask
(data_path, cache_path, suffix=None, verbose=True)[source]¶ Base class for processing a task of a dataset.
Parameters: - data_path (str) – Path to the data directory.
- cache_path (str) – Path to the cache file
- suffix (str, optional) – Suffix to select optional properties for a task.
- verbose (bool, optional) – Be verbose.
Variables: - data_path (str) – Path to the data directory.
- cache_path (str) – Path to the cache file
- suffix (str, optional) – Suffix to select optional properties for a task.
- verbose (bool, optional) – Be verbose.
- filename_h5 (str) – hdf5 metadata file name.
-
add_data_to_default
(handler, data, set_name=None)[source]¶ Add data of a set to the default group.
For each field, the data is organized into a single big matrix.
Parameters: - hdf5_handler (h5py._hl.group.Group) – hdf5 group object handler.
- data (list/dict) – List or dict containing the data annotations of a particular set or sets.
- set_name (str) – Set name.
-
add_data_to_source
(hdf5_handler, data, set_name=None)[source]¶ Store data annotations in a nested tree fashion.
It closely follows the tree structure of the data.
Parameters: - hdf5_handler (h5py._hl.group.Group) – hdf5 group object handler.
- data (list/dict) – List or dict containing the data annotations of a particular set or sets.
- set_name (str) – Set name.