utils

Utility methods for url download, file extraction, data padding and parsing, testing, etc.

Also, all third-party submodules are located under this module.

URL download

Download functions.

dbcollection.utils.url.check_if_url_files_exist(urls, save_dir)[source]

Evaluates if all url filenames exist on disk.

Parameters:
  • urls (list/tuple/dict) – URL paths.
  • dir_save (str) – Directory to store the downloaded data.
dbcollection.utils.url.download_extract_urls(urls, save_dir, extract_data=True, verbose=True)[source]

Download urls + extract files to disk.

Parameters:
  • urls (list/tuple/dict) – URL paths.
  • dir_save (str) – Directory to store the downloaded data.
  • extract_data (bool, optional) – Extracts/unpacks the data files (if true).
  • verbose (bool, optional) – Display messages on screen if set to True.
dbcollection.utils.url.extract_archive_file(filename, save_dir)[source]

Extracts a file archive’s data to a directory.

Parameters:
  • filename (str) – File name + path of the archive file.
  • dir_save (str) – Directory to extract the file archive.
class dbcollection.utils.url.URL[source]

URL manager class.

class dbcollection.utils.url.URLDownload[source]

Download an URL using the requests module.

class dbcollection.utils.url.URLDownloadGoogleDrive[source]

Download an URL from Google Drive.

File loading

Library to load different types of file into memory.

dbcollection.utils.file_load.load_json(fname)[source]

Loads a json file to memory.

Parameters:fname (str) – File name + path.
Returns:Data structure of the input json file.
Return type:dict/list
dbcollection.utils.file_load.load_matlab(fname)[source]

Loads a matlab file to memory.

Parameters:fname (str) – File name + path.
Returns:Data structure of the input matlab file.
Return type:dict/list
dbcollection.utils.file_load.load_pickle(fname)[source]

Loads a pickle file to memory.

Parameters:fname (str) – File name + path.
Returns:Data structure of the input file.
Return type:dict/list
dbcollection.utils.file_load.load_txt(fname, mode='r')[source]

Loads a .txt file to memory.

Parameters:
  • fname (str) – File name + path.
  • mode (str, optional) – File open mode.
Returns:

Return type:

list of strings

dbcollection.utils.file_load.load_xml(fname)[source]

Loads and parses a xml file to a dictionary.

Parameters:fname (str) – File name + path.
Returns:Dictionary of the input file’s data structure.
Return type:dict

Padding

Library of methods for padding/unpadding lists or lists of lists with fill values.

dbcollection.utils.pad.pad_list(listA, val=-1, length=None)[source]

Pad list of lists with ‘val’ such that all lists have the same length.

Parameters:
  • listA (list) – List of lists of different sizes.
  • val (number, optional) – Value to pad the lists.
  • length (number, optional) – Total length of the list.
Returns:

A list of lists with the same same.

Return type:

list

Examples

Pad an uneven list of lists with a value.

>>> from dbcollection.utils.pad import pad_list
>>> pad_list([[0,1,2,3],[45,6],[7,8],[9]])  # pad with -1 (default)
[[0, 1, 2, 3], [4, 5, 6, -1], [7, 8, -1, -1], [9-1, -1, -1]]
>>> pad_list([[1,2],[3,4]])  # does nothing
[[1, 2], [3, 4]]
>>> pad_list([[],[1],[3,4,5]], 0)  # pad lists with 0
[[0, 0, 0], [1, 0, 0], [3, 4, 5]]
>>> pad_list([[],[1],[3,4,5]], 0, 6)  # pad lists with 0 of size 6
[[0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0], [3, 4, 5, 0, 0, 0]]
dbcollection.utils.pad.unpad_list(listA, val=-1)[source]

Unpad list of lists with which has values equal to ‘val’.

Parameters:
  • listA (list) – List of lists of equal sizes.
  • val (number, optional) – Value to unpad the lists.
Returns:

A list of lists without the padding values.

Return type:

list

Examples

Remove the padding values of a list of lists.

>>> from dbcollection.utils.pad import unpad_list
>>> unpad_list([[1,2,3,-1,-1],[5,6,-1,-1,-1]])
[[1, 2, 3], [5, 6]]
>>> unpad_list([[5,0,-1],[1,2,3,4,5]], 5)
[[0, -1], [1, 2, 3, 4]]
dbcollection.utils.pad.squeeze_list(listA, val=-1)[source]

Compact a list of lists into a single list.

Squeezes (spaghettify) a list of lists into a single list. The lists are concatenated into a single one, and to separate them it is used a separating value to mark the split location when unsqueezing the list.

Parameters:
  • listA (list) – List of lists.
  • val (number, optional) – Value to separate the lists.
Returns:

A list with all lists concatenated into one.

Return type:

list

Examples

Compact a list of lists into a single list.

>>> from dbcollection.utils.pad import squeeze_list
>>> squeeze_list([[1,2], [3], [4,5,6]], -1)
[1, 2, -1, 3, -1, 4, 5, 6]
dbcollection.utils.pad.unsqueeze_list(listA, val=-1)[source]

Unpacks a list into a list of lists.

Returns a list of lists by splitting the input list into ‘N’ lists when encounters an element equal to ‘val’. Empty lists resulting of trailing values at the end of the list are discarded.

Source: https://stackoverflow.com/questions/4322705/split-a-list-into-nested-lists-on-a-value

Parameters:
  • listA (list) – A list.
  • val (int/float, optional) – Value to separate the lists.
Returns:

A list of lists.

Return type:

list

Examples

Unpack a list into a list of lists.

>>> from dbcollection.utils.pad import unsqueeze_list
>>> unsqueeze_list([1, 2, -1, 3, -1, 4, 5, 6], -1)
[[1, 2], [3], [4, 5, 6]]

String<->ASCII

String-to-ascii and ascii-to-string convertion methods.

dbcollection.utils.string_ascii.convert_str_to_ascii(inp_str)[source]

Convert a list of strings into an ascii encoded numpy array.

Converts a string or list of strings to a numpy array. The array size is defined by the size of string plus one. This is needed for ascii to str convertion in lua using ffi.string() which expects a 0 at the end of an array.

If a list of strings is used, the size of the array is defined by the size of the longest string (plus one), and zero padded to maitain the array shape.

Parameters:inp_str (str/list/tuple) – String or list of strings to convert to an ascii array.
Returns:Single/multi-dimensional array of ASCII encoded strings.
Return type:np.ndarray

Examples

Example1: Convert a string to a numpy array encoded into ASCII values.

>>> from dbcollection.utils.string_ascii import convertstr_to_ascii
>>> convertstr_to_ascii('string1')
array([115, 116, 114, 105, 110, 103,  49,   0], dtype=uint8)

Example2: Convert a list of lists into an ASCII array.

>>> from dbcollection.utils.string_ascii import convertstr_to_ascii
>>> convertstr_to_ascii(['string1', 'string2', 'string3'])
array([[115, 116, 114, 105, 110, 103,  49,   0],
    [115, 116, 114, 105, 110, 103,  50,   0],
    [115, 116, 114, 105, 110, 103,  51,   0]], dtype=uint8)
dbcollection.utils.string_ascii.convert_ascii_to_str(input_array)[source]

Convert a numpy array to a string (or a list of strings)

Parameters:input_array (np.ndarray) – Array of strings encoded in ASCII format.
Returns:String or list of strings.
Return type:str/list

Examples

Convert a numpy array to a string.

>>> from dbcollection.utils.string_ascii import convert_ascii_to_str
>>> import numpy as np
>>> # ascii format of 'string1'
>>> tensor = np.array([[115, 116, 114, 105, 110, 103, 49, 0]], dtype=np.uint8)
>>> convert_ascii_to_str(tensor)
['string1']
dbcollection.utils.string_ascii.str_to_ascii(input_str)[source]

Converts a string to an ascii encoded numpy array.

Converts a single string of characters into a numpy array coded as ascii.

Parameters:input_str (str) – String data.
Returns:Uni-dimensional array of char values encoded in ASCII format.
Return type:np.ndarray

Examples

Convert a string to numpy array.

>>> from dbcollection.utils.string_ascii import str_to_ascii
>>> str_to_ascii('string1')
array([115, 116, 114, 105, 110, 103,  49], dtype=uint8)
dbcollection.utils.string_ascii.ascii_to_str(input_array)[source]

Converts an ascii encoded numpy array to a string.

Parameters:input_array (np.ndarray) – Input array vector (should be of type dtype=numpy.uint8)
Returns:Single string.
Return type:str

Examples

Convert a numpy array to string.

>>> import numpy as np
>>> from dbcollection.utils.string_ascii import ascii_to_str
>>> ascii_to_str(np.array([115, 116, 114, 105, 110, 103,  49], dtype=uint8))
'string1'

HDF5

hdf5 utility functions.

dbcollection.utils.hdf5.hdf5_write_data(h5_handler, field_name, data, dtype=None, chunks=True, compression='gzip', compression_opts=4, fillvalue=-1)[source]

Write/store data into a hdf5 file.

Parameters:
  • h5_handler (h5py._hl.group.Group) – Handler for an HDF5 group object.
  • field_name (str) – Field name.
  • data (np.ndarray) – Data array.
  • dtype (np.dtype, optional) – Data type.
  • chunks (bool, optional) – Store data as chunks if True.
  • compression (str, optional) – Compression algorithm type.
  • compression_opts (int, optional) – Compression option (range: [1,10])
  • fillvalue (int/float, optional) – Value to pad the data.
Returns:

Handler for an HDF5 dataset object.

Return type:

h5py._hl.dataset.Dataset

Dir db constructor

This module contains methods for parsing directories

dbcollection.utils.os_dir.construct_dataset_from_dir(dir_path, verbose=True)[source]

Build a dataset from a directory.

This method creates a dataset from a root folder. The first child folders compose the dataset’s partition into train/val/test/etc. Then, child folders of these compose the dataset’s classes and all files inside correspond to the data.

Parameters:
  • dir_path (str) – Directory path to create the dataset structure from.
  • verbose (bool, optional) – Prints messages to the screen (if True).
Returns:

Dataset structure.

Return type:

dict

dbcollection.utils.os_dir.construct_set_from_dir(dir_path, verbose=True)[source]

Build a dataset from a directory.

This method creates a dataset from a root folder. The first child folders compose the dataset’s classes and all files inside correspond to the data.

Parameters:
  • dir_path (str) – Directory path to create the set structure from.
  • verbose (bool, optional) – Prints messages to the screen (if True).
Returns:

Set structure with keys as class names and values as image filenames.

Return type:

dict

dbcollection.utils.os_dir.dir_get_size(dir_path)[source]

Returns the number of files and subfolders in a directory.

Parameters:dir_path (str) – Directory path.
Returns:
  • int – Number of files in the folder.
  • int – Number of folders in the path.

Test

Test utility functions/classes.

TestBaseDB

class dbcollection.utils.test.TestBaseDB(name, task, data_dir, verbose=True)[source]

Test Class for loading datasets.

Parameters:
  • name (str) – Name of the dataset.
  • task (str) – Name of the task.
  • data_dir (str) – Path of the dataset’s data directory on disk.
  • verbose (bool, optional) – Be verbose.
Variables:
  • name (str) – Name of the dataset.
  • task (str) – Name of the task.
  • data_dir (str) – Path of the dataset’s data directory on disk.
  • verbose (bool) – Be verbose.
delete_cache()[source]

Delete all cache data + dir

download(extract_data=True)[source]

Download a dataset to disk.

Parameters:extract_data (bool) – Flag signaling to extract data to disk (if True).
list_datasets()[source]

Print dbcollection info

load()[source]

Return a data loader object for a dataset.

Returns:A data loader object of a dataset.
Return type:DataLoader
print_info(loader)[source]

Print information about the dataset to the screen

Parameters:loader (DataLoader) – Data loader object of a dataset.
process()[source]

Process dataset

run(mode)[source]

Run the test script.

Parameters:mode (str) – Task name to execute.
Raises:Exception – If an invalid mode was inserted.

TestDatasetGenerator

Timeout

class dbcollection.utils.test.Timeout(sec)[source]

Timeout class using ALARM signal.

exception Timeout[source]

Third party modules

Third-party modules used by dbcollection.

caltech_pedestrian_extractor

Extract images (.seq to .jpg) and annotation files (.vbb to .json) from the Caltech Pedestrian Dataset.

dbcollection.utils.db.caltech_pedestrian_extractor.converter.extract_data(data_path, save_path, sets=None)[source]

Extract image and annotation data from .vbb and .seq files.

Parameters:
  • data_path (str) – Directory path of data files.
  • save_path (str) – Directory path to store the extracted data.
  • sets (str/list/tuple, optional) – List of set names to extract.
Raises:

TypeError – If sets input arg is not a string, list or tuple.