Dataset

All data is only for research purposes, unless stated differently. Please make sure to reference the authors properly when using the data.

Video Anomaly Dection Dataset

UCF-Crime dataset is a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities.

Real-world Anomaly Detection in Surveillance Videos
Waqas Sultani, Chen Chen, Mubarak Shah
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[Paper] [Video Presentation] [Project Website] [Note] [Code] [Download the dataset]
(Note: The "Anomaly_Train.txt" file in the zip file is corrupted, please down it here: Anomaly_Train.txt)
Option 2: Download the dataset from Dropbox (multiple files): Link

Satellite Smoke Scene Detection Dataset

One important challenge for detecting fire smoke in satellite imagery is the similar disasters and multiple land covers. The commonly used smoke detection methods mainly focus on smoke discrimination from a few specific classes, which reduces their applicability in different regions of various classes. In addition, there is no satellite remote sensing smoke detection dataset so far. To this end, we construct the USTC_SmokeRS dataset and integrate more smoke-like aerosol classes and land covers in the dataset, for example, cloud, dust, haze, bright surfaces, lakes, seaside, vegetation, etc. The USTC_SmokeRS dataset contains a total of 6225 RGB images from six classes: cloud, dust, haze, land, seaside, and smoke. Each image was saved as the ".tif" format with the size of 256 × 256.

SmokeNet: Satellite Smoke Scene Detection Using Convolutional Neural Network with Spatial and Channel-Wise Attention
Rui Ba, Chen Chen, Jiang Yuan, Weiguo Song, Siuming Lo
Remote Sensing, 2019.
[Paper] [Project Website] [Download Dataset from Google Drive] [Download Dataset from OneDrive] [Download Dataset from Baidu Pan (download password: 5dlk)]

Cross-View Geolocalization Dataset

UCF cross-view geolocalization dataset is created for the geo-localization task using cross-view image matching. The dataset has street view and bird's eye view image pairs around downtown Pittsburg, Orlando and part of Manhattan. There are 1,586, 1,324 and 5,941 GPS locations in Pittsburg, Orlando and Manhattan, respectively. We utilize DualMaps to generate side-by-side street view and bird's eye view images at each GPS location with the same heading direction. The street view images are from Google and the overhead 45 degree bird's eye view images are from Bing. For each GPS location, four image pairs are generated with camera heading directions of 0 degree, 90 degree, 180 degree and 270 degree. In order to learn the deep network for building matching, we annotate corresponding buildings in every street view and bird's eye view image pair.

Cross-View Image Matching for Geo-localization in Urban Environments
Yicong Tian, Chen Chen, Mubarak Shah
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
[Paper] [Project (Download Cross-view dataset and code)]

UTD-MHAD Dataset

UTD-MHAD dataset was collected as part of our research on human action recognition using fusion of depth and inertial sensor data. The objective of this research has been to develop algorithms for more robust human action recognition using fusion of data from differing modality sensors. The UTD-MHAD dataset consists of 27 different actions: (1) right arm swipe to the left, (2) right arm swipe to the right, (3) right hand wave, (4) two hand front clap, (5) right arm throw, (6) cross arms in the chest, (7) basketball shoot, (8) right hand draw x, (9) right hand draw circle (clockwise), (10) right hand draw circle (counter clockwise), (11) draw triangle, (12) bowling (right hand), (13) front boxing, (14) baseball swing from right, (15) tennis right hand forehand swing, (16) arm curl (two arms), (17) tennis serve, (18) two hand push, (19) right hand knock on door, (20) right hand catch an object, (21) right hand pick up and throw, (22) jogging in place, (23) walking in place, (24) sit to stand, (25) stand to sit, (26) forward lunge (left foot forward), (27) squat (two arms stretch out).

UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor
Chen Chen, Roozbeh Jafari, Nasser Kehtarnavaz
IEEE International Conference on Image Processing (ICIP), 2015
[Paper] [UTD Multimodal Human Action Dataset Website]