Abstract
Table B-1 is a brief survey of public domain datasets in various categories, in no particular order. Note that many of the public domain datasets are freely available from universities and government agencies.
You have full access to this open access chapter, Download chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Table B-1 is a brief survey of public domain datasets in various categories, in no particular order. Note that many of the public domain datasets are freely available from universities and government agencies.
Name | SUN |
---|---|
Description | Annotated scenes and objects |
Categories | 908 scene categories, 3,819 object categories,13,1072 objects, and growing |
Contributions | Open to contributions |
Tools and apps | Image classifier source code + API, iOS app, Android app |
Key papers | [70] |
Owner | MTI CSAIL |
Link |
Name | UC Irvine Machine Learning Repository |
---|---|
Description | Very useful; huge repository of many categories of images |
Categories | Too many to list; very wide range of categories, many attributes of the data are specifically searchable and designed into the ground truth datasets |
Contributions | Ongoing |
Tools and apps | Online assistant to search for specific ground truth datasets |
Key papers | [550] |
Link |
Name | Stanford 3D Scanning Repository |
---|---|
Description | High-resolution 3D scanned images with sub-millimeter accuracy, including XYZ and RGB datasets |
Categories | Several scanned 3D objects with 3D point clouds, resolution ranging from 3,400,000 scanned point to 750,000 triangles and upwards |
Link |
Name | KITTI Benchmark Suite, Karlsruhe Institute of Technology |
---|---|
Description | Stereo datasets for various city driving scenes |
Categories | KITTI benchmark suite covers optical flow, odometry, object detection, object orientation estimation; Karlsruhe sequences cover gray scale stereo sequences taken from a moving platform driving through a city; Karlsruhe objects cover gray scale stereo sequences taken from a moving platform driving through a city |
Link |
Name | Caltech Object Recognition Datasets |
---|---|
Description | Old but still useful; objects in hundreds of categories, some annotated with outlines |
Categories | Over 256 categories, animals,plants, people, common objects, common food items, tools, furniture, more. |
Key papers | [71] |
Link | http://www.vision.caltech.edu/Image_Datasets/Caltech101/ http://www.vision.caltech.edu/Image_Datasets/Caltech256/ http://authors.library.caltech.edu/7694/(latest versions of 101 and 256) |
Name | Imagenet + Wordnet |
---|---|
Description | Labeled, annotated, bounding-boxed, and feature-descriptor marked images; over 14,197,122 images indexed into 21,841 sets of similar images, or synsets, created using sister app Wordnet |
Categories | Categories include almost anything |
Contributions | Images taken from Internet searches |
Tools and apps | Online controls: http://www.image-net.org/download-API Source Code: ImageNet Large Scale Visual Recognition Challenge (ILSVRC2010) http://www.image-net.org/challenges/LSVRC/2010/index |
Key papers | [72]; several see http://www.image-net.org/about-publication |
Owner | Images have individual owners; website is © Stanford and Princeton |
Link |
Name | Middlebury Computer Vision Datasets |
---|---|
Description | Scholarly and comprehensive datasets, and algorithm comparisons over most of the datasets |
Categories | Stereo vision (excellent), multi-view stereo (excellent), MRF, Optical Flow (excellent), Color processing |
Contributions | Algorithm benchmarks over the datasets can be submitted |
Key papers | Several; see website |
Owner | Middlebury College |
Link |
Name | ADL Activity Recognition Dataset |
---|---|
Description | Annotated scenes for activity recognition of common living scenes |
Categories | Daily life |
Tools and apps | Activity recognition code available (see link below) |
Key papers | [73] |
Link |
Name | MIT Indoor Scenes 67, Scene Classification |
---|---|
Description | Annotated dataset specifically containing diverse indoor scenes |
Categories | 15,620 images organized into 67 indoor categories, some annotations in Labelme format |
Key papers | [74] |
Link |
Name | RGB-D Object Recognition Dataset, U of W |
---|---|
Description | Dataset contains RGB and corresponding depth images |
Categories | 300 common household objects, 51 categories using Wordnet similar to Imagenet style (Imagenet dataset reviewed above), each object recorded in RGB and Kinect depth at various rotational angles and viewpoints |
Key papers | [75] |
Link |
Name | NYU Depth Datasets |
---|---|
Description | Annotated dataset of indoor scenes using RGB-D datasets + accelerometer data |
Categories | Over 500,000 frames, many different indoor scenes and scene types, thousands of classes, accelerometer data, inpainted and raw depth information |
Tools and apps | Matlab toolbox + g++ code |
Key papers | [76] |
Link |
Name | Intel Labs Seattle - Egocentric Recognition of Handled Objects |
---|---|
Description | Annotated dataset for egocentric handled objects using a wearable camera |
Categories | Over 42 everyday objects under varied lighting, occlusion, perspectives; over 6GB total video sequence data |
Key papers | [77] [78] |
Link |
Name | Georgia Tech GTEA Egocentric Activities - Gaze(+) |
---|---|
Description | Annotated dataset for egocentric handled objects using a wearable camera |
Categories | Many everyday objects under varied lighting, occlusion, perspectives |
Tools and apps | Code library of vision functions and mathematical functions |
Key papers | [79] |
Link |
Name | CUReT: Columbia-Utrecht Reflectance and Texture Database |
---|---|
Description | Extensive texture sample and illumination datasets directions |
Categories | Over 60 different samples with over 200 viewing and illumination combinations, BRDF measurement database, more |
Key papers | [80] |
Link |
Name | MIT Flickr Material Surface Category Dataset |
---|---|
Description | Dataset for identifying material categories including fabric, glass, metal, plastic, water, foliage, leather, paper, stone, wood |
Categories | Contains images of materials for surface property analysis, in contrast to object or texture analysis; 10 categories of materials + 100 images in each category |
Key papers | [81] |
Link |
Name | Faces in the Wilds |
---|---|
Description | Collection of over 13,000 images of faces annotated with names of people |
Categories | Faces |
Key papers | [82] |
Link |
Name | The CMU Multi-PIE Face Database |
---|---|
Description | Annotated face and emotion database with multiple pose angles |
Categories | 750,000 face images are taken over a period of several months for each of 337 subjects over 15 viewpoints and 19 illuminations, annotated facial expressions |
Key papers | [83] |
Link |
Name | Stanford 40 Actions |
---|---|
Description | People actions image database |
Categories | People performing 40 actions, bounding-box annotations, 9,532 images, 180-300 images per action class |
Key papers | [84] |
Link |
Name | NORB 3D Object Recognition from Shape |
---|---|
Description | NYU object recognition benchmark |
Categories | Stereo image pairs; 194,400 total images of 50 toys under 36 azimuths, 9 elevations, and 6 lighting conditions |
Tools and apps | EBLEARN C++ learning and vision library, LUSH programming language, VisionGRader object detection tool |
Key papers | [85] |
Link |
Name | Optical Flow Algorithm Evaluation |
---|---|
Description | Tools and data for optical flow evaluation purposes |
Categories | Many optical flow sequence ground truth datasets |
Tools and apps | Tool for generating optical flow data, some optical flow code algorithms |
Key papers | [86] |
Link |
Name | PETS Crowd Sensing Dataset Challenge |
---|---|
Description | Multi-sensor camera views composed into a dataset containing sequences of crowd activities |
Categories | Challenge goals include crowd estimation, density, tracking of specific people, flow of crowd |
Key papers | [94] |
Link |
Name | I-LIDS |
---|---|
Description | Security-oriented challenge ground truth dataset to enable competitive benchmarking including scenes for locating parked vehicles, abandoned baggage, secure perimeters, and doorway surveillance |
Categories | Various categories in the security domain |
Contributions | No, funded by UK government |
Tools and apps | n.a. |
Key papers | n.a. |
Link |
Name | TRECVID, NIST, US Government |
---|---|
Description | NIST-sponsored public project spanning 2001-2013 for research in automatic segmentation, indexing, and content-based video retrieval |
Categories | 1. Semantic indexing (SIN) 2. Known-item search (KIS) 3. Instance search (INS) 4. Multimedia event detection (MED) 5. Multimedia event recounting (MER) 6. Surveillance event detection (SER), natural scenes, humans, vegetation, pets, office objects, more |
Contributions | Annually by U.S. Government |
Tools and apps | The Framework For Detection Evaluations (F4DE) tool, story evaluation tool, and others |
Key papers | [95] |
Link |
Name | Microsoft Research Cambridge |
---|---|
Description | Pixel-wise labeled or segmented objects |
Categories | Several hundred objects |
Link | http://research.microsoft.com/en-us/projects/objectclassrecognition/ |
Name | Optical Flow Algorithm Evaluation |
---|---|
Description | Volume-rendered video scenes for optical flow algorithm benchmarking |
Categories | Various scenes for optical flow; mainly synthetic sequences generated via ray tracing |
Contributions | n.a. |
Tools and apps | Yes, Tcl/Tk |
Key papers | [96] |
Link |
Name | Pascal Object Recognition VOC Challenge Dataset |
---|---|
Description | Standardized ground truth data for a research challenge spanning 2005-2013 in the area of object recognition; competitions include classification, detection, segmentation, and actions over each of 20 classes of data |
Categories | Consists of over 20 classes of objects in scenes including persons, animals, vehicles, indoor objects |
Contributions | Via the Pascal conference |
Tools and apps | Includes a developer kit and other useful software for labeling data and database access, and tools for reporting benchmarks results |
Key papers | [97] |
Link |
Name | CRCV |
---|---|
Description | Very extensive; University of Central Florida’s Center for Research in Computer Vision hosts a large collection of research data covering several domains |
Categories | Comprehensive set of categories (aerial views, ground views) including dynamic textures, multi-modal iPhone sensor ground truth data (video, accelerometer, gyro), several categories of human actions, crowd segmentation, parking lots, human actions, much more |
Contributions | n.a. |
Tools and apps | n.a. |
Key papers | [98] |
Link |
Name | UCB Contour Detection and Image Segmentation |
---|---|
Description | U.C. Berkeley Computer Vision group provides a complete set of ground truth data, algorithms, and performance evaluations for contour detection, image segmentation, and some interest point methods |
Categories | 500 ground truth images on natural scenes containing a wide range of subjects and labeled ground truth data |
Contributions | n.a. |
Tools and apps | Benchmarking code (globalPB for CPU and GPU) |
Key papers | [99] |
Link | http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html#bench |
Name | CAVIAR Ground Truth Videos for Context-Aware Vision |
---|---|
Description | Project site containing labeled and annotated ground truth data of humans in cities and shopping centers, including 52 videos with 90K frames total including people in indoor office scenes and shopping centers |
Categories | Both scripted and real-life activities in shopping centers and offices, including walking, browsing, meeting, fighting, window shopping, entering/exiting stores |
Contributions | n.a. |
Tools and apps | n.a. |
Key papers | [100] |
Link |
Name | Boston University Computer Science Department |
Description | Image and video database covering a wide range of subject categories |
Categories | Video sequences for head tracking and sign language; some datasets are labeled; still images for hand tracking, multi-face tracking, vehicle tracking, more |
Contributions | Anonymous FTP |
Tools and apps | n.a. |
Key papers | [101] |
Link |
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this chapter or parts of it.
The images or other third party material in this chapter are included in the chapter’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2014 Scott Krig
About this chapter
Cite this chapter
Krig, S. (2014). Survey of Ground Truth Datasets. In: Computer Vision Metrics. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4302-5930-5_10
Download citation
DOI: https://doi.org/10.1007/978-1-4302-5930-5_10
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4302-5929-9
Online ISBN: 978-1-4302-5930-5
eBook Packages: Professional and Applied ComputingProfessional and Applied Computing (R0)Apress Access Books