Research Interests

My current research interests focus on developing some algorithms and systems to enable semantic-sensitive image and video retrieval.

My PhD students and I are thinking about the following problems: (a) How can we represent the image/video content such that we can support semantic image/video classification and retrieval? This problem is very important because the ability of the low-level perceptual features to discriminate among various semantic concepts largely depends on their quality but their quality largely depends on the image/video content representation framework. (b) What are the concept-sensitive image/video patterns to enable this concept-sensitive image/video content representation framework? How can we obtain these concept-sensitive image/video patterns automatically? (c) How can we bridge the semantic gap between the features and semantic concepts? Concept learning is good enough or not? If the classification conditions are different from our expectation, what will happen? (d) How can achieve image/video database indexing in high-dimensional feature space? (e) How can the naive users specify their query concepts without learning? Relevance feedback is the only solution for this? (f) What is the real application domain for multimedia information retrieval? Do you really need it or just for researcher's interesting? Have we met the same situation as AI (i.e., big promise with less progress)?

a. Image & Video Analysis

Content-based image and video analysis is the first step to support more effective image and video classification, indexing and retrieval. I personally believe that image/video segmentation is a necessary step to support content-based image/video retrieval. My current works focus on salient object detection and principal video shot detection.

The major steps for our salient object detection technique are shown in the following figure: (1) Our previous automatic image segmentation technique is first performed to obtain homogeneous regions on color or texture; (2) Support Vector Machine (SVM) technique is then used to classify these homogeneous image regions into two binary classes (SVM is good for this objective): regions related to a certain type of salient objects versus regions irrelevant to a certain type of salient objects; (3) The connected homogeneous image regions with the same semantic label (obtained by SVM automatically) are then aggregated as the corresponding salient object. Our preliminary results on salient object detection, such as beach and sunset, are given in the following figures.

The salient objects and principal video shots are defined some concept-sensitive image/video patterns which have the ability to relate their low-level features to the semantic concepts implicitly. Our current achievements have confirmed that introducing salient objects and principal video shots can really help us achieve more effective semantic image/video classification.

b. Semantic Image and Video Concept Interpretation

How to formulate the semantic video concepts by using the low-level perceptual features is a challenging research problem. To solve this problem, we have to take care several key issues: (1) How many multimodal perceptual patterns are implicitly or explicitly related to a certain semantic concept? (2) How can we formulate this relationship so that the semantic image or video concepts can be computable?

c. Semantic Image & Video Classification

Cassifying images and videos into semantic categories that are meaningful to human being is a promising approach to bridge the semantic gap and the expectation gap between current systems and users' expectation. To do this, we are now working on two key issues: (1) How can we formulate the semantic image/video concepts? (2) Which practical classification algorithms can be used? (3) If labeled samples are insufficient or concept is drifted along the time, can we do good classification?

Recently, we have developed an adaptive Expectation-Maximization algorithm to support more effective semantic image/video classification. Our adaptive EM algorithm has better performance than traditional EM algorithm and SVM technique. Our preliminary results on semantic medical video and image classification are given in the following figures. The average performance is very attractive.

d. Applications

Content-based image and video retrieval has attracted researchers from various disciplines, each with their own algorithms and concerns. Unfortunately, there are few convincing stories we can tell about successful applications of the research results. I am also working on pushing my research on CBIR and CBVR to support more effective multimedia medical education, multimedia search engine, and home healthcare.