Visual Recognition in Construction Sites for Fall Protection and Safety Enhancement
1. Research Motivation:
The U.S. construction industry seriously suffers from the highest number of fatalities among all industries: as shown in Fig. 1(a), one in five worker deaths (i.e., 874 out of 4,251) in private industries were in construction. Actually, in the past decade, the number of worker deaths in construction (9,836 in 2005-2014) is even 44\% more than the fatalities from the American war and military operations (6,830 in 2001-2014), and tremendous loss has occurred to the workers' families, the industry and our nation. According to the estimate, every fatal occupational injury in construction would cost 5.2 million dollars. Given the total number of fatalities in construction as shown in Fig. 1(b), the financial cost on our nation is huge (e.g., 4,544 million dollars in 2014). To protect our nation's construction workforces and save our nation's economy, there is an urgent need to develop new methods for improving construction safety.
The current practice of sending compliance officers to perform safety inspections plays an important role in reducing safety hazards in the construction workplaces, but such process is very expensive because training compliance inspectors demands high investment. Nowadays, surveillance cameras are widely used to monitor the daily construction activities and fields, and we can easily collect massive construction surveillance images/videos from multiple sources (sites). Then we ask ourselves the following questions: (a) How can we leverage massive construction surveillance images with huge label uncertainties to support automatic identification of fall safety violations? (b) How can big data analytics technologies become a stakeholder for this specific application? (c) How can we bridge big interdisciplinary knowledge gaps between computer scientists and construction experts and leverage high-level human guidances for improving classifier training?
Developing scalable learning framework to leverage massive construction surveillance images for supporting fast detection of fall safety violations is an under-explored interdisciplinary domain. In this project, we will focus on tackling the following challenging issues: (a) Computer-Understandable Representations of OSHA Safety Rules; (b) Huge Label Uncertainty; (c) Large-Scale Object Detection and Fall Safety Violation Identification; (d) Interdisciplinary Knowledge Gap and High-Level Human Guidances.
2. Research Tasks
To tackle these challenges, we will develop a scalable learning framework to leverage massive construction surveillance images to train large numbers of node classifiers over a visual hierarchy. Our scalable learning framework contains the following innovative researches: (1) Construction ontology to transfer the OSHA safety rules into computer-understandable concepts (object classes of interest in construction sites);
(2) Label uncertainty reduction to generate large-scale labeled image instances (object regions with precise labels) for large numbers of object classes in construction sites;
(3) Visual hierarchy to support scalable indexing of large numbers of object classes;
(4) Scalable learning to train large-scale node classifiers iteratively over the visual hierarchy;
(5) Scalable visualization to enable interactive assessment of classifiers and hypotheses;
(6) Visual analytics to leverage high-level human guidances for improving classifier training.
3. Semantic Representations of OSHA Safety Rules
Fig. 2 presents a conceptual overview of the process for fall safety compliance inspection in construction sites. Such process contains three key steps: (a) identifying all potential unsafe conditions and acts (object classes of interest in construction sites) including floor openings, rooftops, precast wall erection, walling openings, skylights, structure sides/edges, open pits, and other excavations that may involve dangers to site workers; (b) examining the potential risks of fall safety violations; and (c) assessing whether the protective measures (i.e., a safety net system and/or guardrail system) should be provided. Therefore, the first task of this project is to transfer all the OSHA safety rules into computer-understandable concepts (object classes of interest in construction sites). We will develop an ontology-based framework to formulate computer-understandable representations of the OSHA safety rules by transferring all the OSHA safety rules into computer-understandable concepts (object classes of interest in construction sites) as shown in Fig. 3.
4. Label Uncertainty Reduction
Besides the common notions of volume, velocity, veracity and variety, another principle of such big construction data is the low quality of the associated labels, e.g., massive construction surveillance images may seriously suffer from huge label uncertainties (because of coarse labeling, abstract labeling, irrelevant labeling, and noisy labeling) and could be harmful to the data-driven machine learning methods. To tackle the issue of abstract labeling, we will incorporate our construction ontology to decompose such abstract labels at the high-level non-leaf nodes into a set of object labels at the leaf nodes, and we will assign the relevant construction surveillance images for such abstract labels into all these object labels. Obviously, such process for abstract label decomposition will result in irrelevant labeling because each abstract label (at the high-level non-leaf node) may relate with multiple object labels (at the leaf nodes of the construction ontology). To address the issues of coarse labeling, irrelevant labeling and noisy labeling jointly, we will develop two alternative approaches for label uncertainty reduction: (1) automatic object-label alignment approach; (2) deep multiple instance learning (D-MIL) approach.
5. Scalable Indexing of Large-Scale Object Classes in Construction Sites
Our visual hierarchy will provide a good environment to: (1) determine the inter-related learning tasks automatically (i.e., the learning tasks for training the classifiers for $B$ sibling child nodes under the same parent node are strongly inter-related); (2) make our multi-task structural learning algorithm to be computationally affordable because each parent node contains only a small number of sibling child nodes; (3) support scalable indexing of large-scale object classes in a coarse-to-fine fashion and reduce the computational costs significantly by ruling out unlikely groups of object classes (i.e., irrelevant high-level non-leaf nodes) at an early stage; (4) achieve an iterative solution for scalable learning of large numbers of node classifiers.
6. Scalable Learning of Large Numbers of Node Classifiers over Visual Hierarchy
In order to predict the potential risks of fall safety violations in construction sites, we will first learn the classifiers for detecting large numbers of object classes from massive construction surveillance images. To achieve fast detection of large numbers of object classes in construction sites, we will develop a scalable learning framework to train large-scale node classifiers iteratively over the visual hierarchy: (1) For the visually-similar object classes at the sibling leaf nodes under the same parent node on our visual hierarchy, they share significant common visual properties and are usually hard to be distinguished, thus we will develop a multi-task structural learning algorithm to train their inter-related classifiers jointly to enhance their discrimination power; (2) For the coarse-grained groups of object classes at the sibling non-leaf nodes under the same parent node, we will develop a hierarchical multi-task structural learning algorithm to train their inter-related classifiers iteratively.
7. Scalable Visualization for Interactive Algorithm Evaluation
Beyond validating our scalable learning algorithms according to the accuracy rates of the learned classifiers, we will further develop a scalable visualization framework for interactive algorithm evaluation by allowing both construction experts and computer scientists to: (a) communicate their observations explicitly and bridge their big interdisciplinary knowledge gap effectively; (b) assess the correctness of the underlying metrics (or kernels) for similarity characterization and explore the clusters of image instances; (c) evaluate the discrimination power of various feature subsets; (d) assess the correctness of the boundaries of the classifiers under different hypotheses; and (e) evaluate the significance of the inter-classifier margins that can directly reflect the discrimination power of the classifiers (e.g., larger margin corresponds higher discrimination power). Such interactive evaluation process can allow users to directly see what is missing, what is expected, what is unexpected, and what is conjectured.
Our evaluation plan contains three parts: (a) evaluating the accuracy rates of the learned classifiers over large-scale test images in our labs; (b) leveraging our scalable visualization technique for interactive assessment of the correctness of the learned classifiers; (c) field test of our system in real environments by our industry-partners.
8. Preliminary Publications:
F. Dai, F. Ye, J. Fan,
``Learning in unordered and static daily construction site photos for roof detection: A step toward automated safety performance monitoring for work on rooftops", 16th Intl Conf. on Computing in Civil and Building Engineering (ICCCBE2016), 2016.
If we know what we were doing, it wouldn't be research, would it?