Datasets


All data is only for research purposes, unless stated differently. Please make sure to reference the authors properly when using the data.

UCF-Crime Dataset

                                        UCF-Crime dataset is a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities.

Related publication:

Real-world Anomaly Detection in Surveillance Videos
Waqas Sultani, Chen Chen, Mubarak Shah
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[Paper]
[Project (Dataset Download Option 1)]
Note: Please follow this dataset download instructions from UCF site: Readme.txt
[Option 2]
Or Option 3: Download the dataset from Dropbox (multiple files): Link
Notes on using the dataset


UCF Cross-View Geolocalization Dataset

                                        UCF cross-view geolocalization dataset is created for the geo-localization task using cross-view image matching. The dataset has street view and bird's eye view image pairs around downtown Pittsburg, Orlando and part of Manhattan. There are 1,586, 1,324 and 5,941 GPS locations in Pittsburg, Orlando and Manhattan, respectively. We utilize DualMaps to generate side-by-side street view and bird's eye view images at each GPS location with the same heading direction. The street view images are from Google and the overhead 45 degree bird's eye view images are from Bing. For each GPS location, four image pairs are generated with camera heading directions of 0 degree, 90 degree, 180 degree and 270 degree. In order to learn the deep network for building matching, we annotate corresponding buildings in every street view and bird's eye view image pair.

Related publication:

Cross-View Image Matching for Geo-localization in Urban Environments
Yicong Tian, Chen Chen, Mubarak Shah
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
[Paper] [Project (Download Cross-view dataset and code)]


UTD-MHAD Dataset

                                        UTD-MHAD dataset was collected as part of our research on human action recognition using fusion of depth and inertial sensor data. The objective of this research has been to develop algorithms for more robust human action recognition using fusion of data from differing modality sensors. The UTD-MHAD dataset consists of 27 different actions: (1) right arm swipe to the left, (2) right arm swipe to the right, (3) right hand wave, (4) two hand front clap, (5) right arm throw, (6) cross arms in the chest, (7) basketball shoot, (8) right hand draw x, (9) right hand draw circle (clockwise), (10) right hand draw circle (counter clockwise), (11) draw triangle, (12) bowling (right hand), (13) front boxing, (14) baseball swing from right, (15) tennis right hand forehand swing, (16) arm curl (two arms), (17) tennis serve, (18) two hand push, (19) right hand knock on door, (20) right hand catch an object, (21) right hand pick up and throw, (22) jogging in place, (23) walking in place, (24) sit to stand, (25) stand to sit, (26) forward lunge (left foot forward), (27) squat (two arms stretch out).

Related publication:

UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor
Chen Chen, Roozbeh Jafari, Nasser Kehtarnavaz
IEEE International Conference on Image Processing (ICIP), 2015
[Paper] [UTD Multimodal Human Action Dataset Website]