utils¶
Utility methods for url download, file extraction, data padding and parsing, testing, etc.
Also, all third-party submodules are located under this module.
URL download¶
Download functions.
-
dbcollection.utils.url.
check_if_url_files_exist
(urls, save_dir)[source]¶ Evaluates if all url filenames exist on disk.
Parameters: - urls (list/tuple/dict) – URL paths.
- dir_save (str) – Directory to store the downloaded data.
-
dbcollection.utils.url.
download_extract_urls
(urls, save_dir, extract_data=True, verbose=True)[source]¶ Download urls + extract files to disk.
Parameters: - urls (list/tuple/dict) – URL paths.
- dir_save (str) – Directory to store the downloaded data.
- extract_data (bool, optional) – Extracts/unpacks the data files (if true).
- verbose (bool, optional) – Display messages on screen if set to True.
File loading¶
Library to load different types of file into memory.
-
dbcollection.utils.file_load.
load_json
(fname)[source]¶ Loads a json file to memory.
Parameters: fname (str) – File name + path. Returns: Data structure of the input json file. Return type: dict/list
-
dbcollection.utils.file_load.
load_matlab
(fname)[source]¶ Loads a matlab file to memory.
Parameters: fname (str) – File name + path. Returns: Data structure of the input matlab file. Return type: dict/list
-
dbcollection.utils.file_load.
load_pickle
(fname)[source]¶ Loads a pickle file to memory.
Parameters: fname (str) – File name + path. Returns: Data structure of the input file. Return type: dict/list
Padding¶
Library of methods for padding/unpadding lists or lists of lists with fill values.
-
dbcollection.utils.pad.
pad_list
(listA, val=-1, length=None)[source]¶ Pad list of lists with ‘val’ such that all lists have the same length.
Parameters: - listA (list) – List of lists of different sizes.
- val (number, optional) – Value to pad the lists.
- length (number, optional) – Total length of the list.
Returns: A list of lists with the same same.
Return type: list
Examples
Pad an uneven list of lists with a value.
>>> from dbcollection.utils.pad import pad_list >>> pad_list([[0,1,2,3],[45,6],[7,8],[9]]) # pad with -1 (default) [[0, 1, 2, 3], [4, 5, 6, -1], [7, 8, -1, -1], [9-1, -1, -1]] >>> pad_list([[1,2],[3,4]]) # does nothing [[1, 2], [3, 4]] >>> pad_list([[],[1],[3,4,5]], 0) # pad lists with 0 [[0, 0, 0], [1, 0, 0], [3, 4, 5]] >>> pad_list([[],[1],[3,4,5]], 0, 6) # pad lists with 0 of size 6 [[0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0], [3, 4, 5, 0, 0, 0]]
-
dbcollection.utils.pad.
unpad_list
(listA, val=-1)[source]¶ Unpad list of lists with which has values equal to ‘val’.
Parameters: - listA (list) – List of lists of equal sizes.
- val (number, optional) – Value to unpad the lists.
Returns: A list of lists without the padding values.
Return type: list
Examples
Remove the padding values of a list of lists.
>>> from dbcollection.utils.pad import unpad_list >>> unpad_list([[1,2,3,-1,-1],[5,6,-1,-1,-1]]) [[1, 2, 3], [5, 6]] >>> unpad_list([[5,0,-1],[1,2,3,4,5]], 5) [[0, -1], [1, 2, 3, 4]]
-
dbcollection.utils.pad.
squeeze_list
(listA, val=-1)[source]¶ Compact a list of lists into a single list.
Squeezes (spaghettify) a list of lists into a single list. The lists are concatenated into a single one, and to separate them it is used a separating value to mark the split location when unsqueezing the list.
Parameters: - listA (list) – List of lists.
- val (number, optional) – Value to separate the lists.
Returns: A list with all lists concatenated into one.
Return type: list
Examples
Compact a list of lists into a single list.
>>> from dbcollection.utils.pad import squeeze_list >>> squeeze_list([[1,2], [3], [4,5,6]], -1) [1, 2, -1, 3, -1, 4, 5, 6]
-
dbcollection.utils.pad.
unsqueeze_list
(listA, val=-1)[source]¶ Unpacks a list into a list of lists.
Returns a list of lists by splitting the input list into ‘N’ lists when encounters an element equal to ‘val’. Empty lists resulting of trailing values at the end of the list are discarded.
Source: https://stackoverflow.com/questions/4322705/split-a-list-into-nested-lists-on-a-value
Parameters: - listA (list) – A list.
- val (int/float, optional) – Value to separate the lists.
Returns: A list of lists.
Return type: list
Examples
Unpack a list into a list of lists.
>>> from dbcollection.utils.pad import unsqueeze_list >>> unsqueeze_list([1, 2, -1, 3, -1, 4, 5, 6], -1) [[1, 2], [3], [4, 5, 6]]
String<->ASCII¶
String-to-ascii and ascii-to-string convertion methods.
-
dbcollection.utils.string_ascii.
convert_str_to_ascii
(inp_str)[source]¶ Convert a list of strings into an ascii encoded numpy array.
Converts a string or list of strings to a numpy array. The array size is defined by the size of string plus one. This is needed for ascii to str convertion in lua using ffi.string() which expects a 0 at the end of an array.
If a list of strings is used, the size of the array is defined by the size of the longest string (plus one), and zero padded to maitain the array shape.
Parameters: inp_str (str/list/tuple) – String or list of strings to convert to an ascii array. Returns: Single/multi-dimensional array of ASCII encoded strings. Return type: np.ndarray Examples
Example1: Convert a string to a numpy array encoded into ASCII values.
>>> from dbcollection.utils.string_ascii import convertstr_to_ascii >>> convertstr_to_ascii('string1') array([115, 116, 114, 105, 110, 103, 49, 0], dtype=uint8)
Example2: Convert a list of lists into an ASCII array.
>>> from dbcollection.utils.string_ascii import convertstr_to_ascii >>> convertstr_to_ascii(['string1', 'string2', 'string3']) array([[115, 116, 114, 105, 110, 103, 49, 0], [115, 116, 114, 105, 110, 103, 50, 0], [115, 116, 114, 105, 110, 103, 51, 0]], dtype=uint8)
-
dbcollection.utils.string_ascii.
convert_ascii_to_str
(input_array)[source]¶ Convert a numpy array to a string (or a list of strings)
Parameters: input_array (np.ndarray) – Array of strings encoded in ASCII format. Returns: String or list of strings. Return type: str/list Examples
Convert a numpy array to a string.
>>> from dbcollection.utils.string_ascii import convert_ascii_to_str >>> import numpy as np >>> # ascii format of 'string1' >>> tensor = np.array([[115, 116, 114, 105, 110, 103, 49, 0]], dtype=np.uint8) >>> convert_ascii_to_str(tensor) ['string1']
-
dbcollection.utils.string_ascii.
str_to_ascii
(input_str)[source]¶ Converts a string to an ascii encoded numpy array.
Converts a single string of characters into a numpy array coded as ascii.
Parameters: input_str (str) – String data. Returns: Uni-dimensional array of char values encoded in ASCII format. Return type: np.ndarray Examples
Convert a string to numpy array.
>>> from dbcollection.utils.string_ascii import str_to_ascii >>> str_to_ascii('string1') array([115, 116, 114, 105, 110, 103, 49], dtype=uint8)
-
dbcollection.utils.string_ascii.
ascii_to_str
(input_array)[source]¶ Converts an ascii encoded numpy array to a string.
Parameters: input_array (np.ndarray) – Input array vector (should be of type dtype=numpy.uint8) Returns: Single string. Return type: str Examples
Convert a numpy array to string.
>>> import numpy as np >>> from dbcollection.utils.string_ascii import ascii_to_str >>> ascii_to_str(np.array([115, 116, 114, 105, 110, 103, 49], dtype=uint8)) 'string1'
HDF5¶
hdf5 utility functions.
-
dbcollection.utils.hdf5.
hdf5_write_data
(h5_handler, field_name, data, dtype=None, chunks=True, compression='gzip', compression_opts=4, fillvalue=-1)[source]¶ Write/store data into a hdf5 file.
Parameters: - h5_handler (h5py._hl.group.Group) – Handler for an HDF5 group object.
- field_name (str) – Field name.
- data (np.ndarray) – Data array.
- dtype (np.dtype, optional) – Data type.
- chunks (bool, optional) – Store data as chunks if True.
- compression (str, optional) – Compression algorithm type.
- compression_opts (int, optional) – Compression option (range: [1,10])
- fillvalue (int/float, optional) – Value to pad the data.
Returns: Handler for an HDF5 dataset object.
Return type: h5py._hl.dataset.Dataset
Dir db constructor¶
This module contains methods for parsing directories
-
dbcollection.utils.os_dir.
construct_dataset_from_dir
(dir_path, verbose=True)[source]¶ Build a dataset from a directory.
This method creates a dataset from a root folder. The first child folders compose the dataset’s partition into train/val/test/etc. Then, child folders of these compose the dataset’s classes and all files inside correspond to the data.
Parameters: - dir_path (str) – Directory path to create the dataset structure from.
- verbose (bool, optional) – Prints messages to the screen (if True).
Returns: Dataset structure.
Return type: dict
-
dbcollection.utils.os_dir.
construct_set_from_dir
(dir_path, verbose=True)[source]¶ Build a dataset from a directory.
This method creates a dataset from a root folder. The first child folders compose the dataset’s classes and all files inside correspond to the data.
Parameters: - dir_path (str) – Directory path to create the set structure from.
- verbose (bool, optional) – Prints messages to the screen (if True).
Returns: Set structure with keys as class names and values as image filenames.
Return type: dict
Test¶
Test utility functions/classes.
TestBaseDB¶
-
class
dbcollection.utils.test.
TestBaseDB
(name, task, data_dir, verbose=True)[source]¶ Test Class for loading datasets.
Parameters: - name (str) – Name of the dataset.
- task (str) – Name of the task.
- data_dir (str) – Path of the dataset’s data directory on disk.
- verbose (bool, optional) – Be verbose.
Variables: - name (str) – Name of the dataset.
- task (str) – Name of the task.
- data_dir (str) – Path of the dataset’s data directory on disk.
- verbose (bool) – Be verbose.
-
download
(extract_data=True)[source]¶ Download a dataset to disk.
Parameters: extract_data (bool) – Flag signaling to extract data to disk (if True).
-
load
()[source]¶ Return a data loader object for a dataset.
Returns: A data loader object of a dataset. Return type: DataLoader
-
print_info
(loader)[source]¶ Print information about the dataset to the screen
Parameters: loader (DataLoader) – Data loader object of a dataset.
TestDatasetGenerator¶
Third party modules¶
Third-party modules used by dbcollection.
caltech_pedestrian_extractor¶
Extract images (.seq to .jpg) and annotation files (.vbb to .json) from the Caltech Pedestrian Dataset.
-
dbcollection.utils.db.caltech_pedestrian_extractor.converter.
extract_data
(data_path, save_path, sets=None)[source]¶ Extract image and annotation data from .vbb and .seq files.
Parameters: - data_path (str) – Directory path of data files.
- save_path (str) – Directory path to store the extracted data.
- sets (str/list/tuple, optional) – List of set names to extract.
Raises: TypeError
– If sets input arg is not a string, list or tuple.