Welcome to dbcollection’s documentation!¶
dbcollection is a library for downloading/parsing/managing datasets via simple methods.
It was built from the ground up to be cross-platform (Windows, Linux, MacOS) and
cross-language (Python, Lua, Matlab, etc.). This is achieved by using the popular
file format to store (meta)data of manually parsed datasets and the power of Python for
scripting. By doing so, this library can target any platform that supports Python and
any language that has bindings for
This package allows to easily manage and load datasets by using
HDF5 files to store
metadata. By storing all the necessary metadata to disk, managing either big or small
datasets has an equal or very similar impact on the system’s resource usage.
Also, once a dataset is setup, it is setup forever! This means users can reuse any
previously set dataset as many times as needed without having to set it each time they
dbcollection allows users to focus on more important tasks like prototyping new models or testing them in different datasets without having to incur the loss of spending time managing datasets or creating/modyfing scripts to load/fetch data by taking advantage of the work of the community that shared these resources.
This library contains a (growing!) list of popular datasets in computer science for many fields like object detection, classification, human body joint detection, captioning, etc. It provides a great way to quickly start hacking on a number of different tasks by skipping the boring task of learning how to set/parse datasets (and sometimes dealing with human errors in annotated data).
Also, since it has been developed with community in mind, this library should encourage users to write and share their scripts for downloading/parsing other datasets with the community.
Here are some of key features dbcollection provides:
- Simple API to load/download/setup/manage datasets.
- Simple API to fetch data from a dataset.
- Store and pull data from disk or from memory, you choose!
- Datasets only need to be set/processed once, so next time you use it it will load instantly!
- Cross-platform (Windows, Linux, MacOs).
- Cross-language (Python, Lua/Torch7, Matlab).
- Easily extensible to other languages that support
- Concurrent/parallel data access thanks to
- Contains a diverse (and growing!) list of popular datasets for machine-, deep-learning tasks (object detection, action recognition, human pose estimation, etc.)