Tweeting cameras for event detection

In the modern big data world, development of a huge number of physical or social sensors provides us a great opportunity to explore both cyber and physical situation awareness. However, social sensor fusion for situation awareness is still in its infancy and lacks a unified framework to aggregate and composite real-time media streams from diverse sensors and social network platforms. We propose a new paradigm where sensor and social information are fused together facilitating event detection or customized services. Our proposal consists of 1) a tweeting camera framework where cameras can tweet event related information; 2) a hybrid social sensor fusion algorithm utilizing spatio-temporal-semantic information from multimodal sensors and 3) a new social-cyber-physical paradigm where human and sensors are collaborating for event fusion.

Tweeting cameras for event detection, Y. Wang and M. S. Kankanhalli, In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, pages 1231–1241, 2015.

Socializing multimodal sensors for information fusion, Y. Wang, In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, pages 653–656, 2015. (Doctoral Symposium Best Paper Award).


Real-time Photography Assistance Using Social Media

In this work we have developed a photography model based on machine learning which can assist a user in capturing high quality photographs. As scene composition and camera parameters play a vital role in aesthetics of a captured image, the proposed method addresses the problem of learning photographic composition and camera parameters. Further, we observe that context is an important factor from a photography perspective, we therefore augment the learning with associated contextual information. The proposed method utilizes publicly available photographs along with social media cues and associated metainformation in photography learning. We define context features based on factors such as time, geolocation, environmental conditions and type of image, which have an impact on photography. We also propose the idea of computing the photographic composition basis, eigenrules and baserules, to support our composition learning. The proposed system can be used to provide feedback to the user regarding scene composition and camera parameters while the scene is being captured. It can also recommend position in the frame where people should stand for better composition. Moreover, it also provides camera motion guidance for pan, tilt and zoom to the user for improving scene composition.

Real-Time Assistance in Multimedia Capture Using Social Media, Y. Rawat, In Proceedings of the ACM International Conference on Multimedia, (Doctoral Symposium),Oct 2015.

Context-Aware Photography Learning for Smart Mobile Devices, Y. Rawat and M. Kankanhalli, In ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 12, No. 1s, Oct 2015.

Context-Based Photography Learning using Crowdsourced Images and Social Media, Y. Rawat and M. Kankanhalli, In Proceedings of the ACM International Conference on Multimedia, Grand Challenge(MM), 2014.


Decision-Theoretic Approach to Maximizing Observation of Multiple Targets in Multi-Camera Surveillance

This work presents a novel decision-theoretic approach to control and coordinate multiple active cameras for observing a number of moving targets in a surveillance system. This approach offers the advantages of being able to (a) account for the stochasticity of targets’ motion via probabilistic modeling, and (b) address the tradeoff between maximizing the expected number of observed targets and the resolution of the observed targets through stochastic optimization. One of the key issues faced by existing approaches in multi-camera surveillance is that of scalability with increasing number of targets. We show how its scalability can be improved by exploiting the problem structure: as proven analytically, our decision-theoretic approach incurs time that is linear in the number of targets to be observed during surveillance. As demonstrated empirically through simulations, our proposed approach can achieve high-quality surveillance of up to 50 targets in real time and its surveillance performance degrades gracefully with increasing number of targets. We also demonstrate our proposed approach with real AXIS 214 PTZ cameras in maximizing the number of Lego robots observed at high resolution over a surveyed rectangular area. The results are promising and clearly show the feasibility of our decision-theoretic approach in controlling and coordinating the active cameras in real surveillance system.

Decision-Theoretic Approach to Maximizing Observation of Multiple Targets in Multi-Camera Surveillance, P. Natarajan, T.N. Hoang, K.H. Low and M.S. Kankanhalli, In International Conference on Autonomous Agents and MultiAgent Systems (AAMAS2012), pp. 155-162, Spain, June 4-8, 2012.


Automatic Retargeting and Reprojection for Home Video Editing

In this work, we present an automated post-processing method for home produced videos based on frame “interestingness”. The input single video clip is treated as a long take, and film editing operations for sequence shot are performed. The proposed system automatically adjusts the distribution of interestingness, both spatially and temporally, in the video clip. We use the idea of video retargeting to introduce fake camera work and manipulate spatial interestingness, then we perform video re-projection to introduce motion rhythm and modify the temporal distribution of interestingness. Experimental results show the interestingness improvement in our output video clips.

Video Retargeting for Aesthetic Enhancement, Y. Xiang and M.S. Kankanhalli, ACM International Conference on Multimedia (ACMMM 2010), Florence, Oct 2010.


A Synaesthetic Approach for Image Slideshow Generation

We present a novel automatic image slideshow system that explores a new medium of images and music. It can be regarded as a new image selection and slideshow composition criterion. Based on the idea of "hearing colors, seeing sounds" from the art of music visualization, equal importance is assigned to image features and audio properties for better synchronization. We minimize the aesthetic energy distance between visual and audio features. Given a set of images, a subset is selected by correlating image features with the input audio properties. The selected images are then synchronized with the music subclips by their audio-visual distance. The inductive image displaying approach has been introduced for common displaying devices.

A Synaesthetic Approach for Image Slideshow Generation, Y. Xiang and M.S. Kankanhalli, IEEE International Conference on Multimedia and Expo (ICME2012), Melbourne, July 2012.


Affect-based Adaptive Presentation of Home Videos

In recent times, the proliferation of multimedia devices and reduced costs of data storage have enabled people to easily record and collect a large number of home videos; furthermore, this collection is growing with time. With the popularity of participatory media such as YouTube and facebook, problems are encountered when people intend to share their home videos with others. The first problem is that different people might be interested in different video content. Given the numbers of home videos, it is a time-consuming and hard task to manually select proper content for people with different interests. Secondly, as short videos are becoming more and more popular in media sharing applications, people need to manually cut and edit home videos which is again a tedious task. In this paper, we propose a method that employs affective analysis to automatically create video presentations from home videos. Our novel method adaptively creates presentations based on three properties: emotional tone, local main character and global main character. A novel sparsity-based affective labeling method is proposed to identify the emotional content of the videos. The local and global main characters are determined by applying face recognition in each shot. To demonstrate the proposed method, three kinds of presentations are created for family, acquaintance and outsider. Experimental results show that our method is very effective in video sharing and the users are satisfied with the videos generated by our method.

Affect-based Adaptive Presentation of Home Videos, X.H. Xiang and M.S. Kankanhalli, ACM International Conference on Multimedia (ACMMM 2011), Scottsdale, Nov 2011.


NUSEF: National University of Singapore Eye Fixation Database

We present analyses and results from an eye-tracking study to investigate "What attracts human visual attention in semantically rich images ?". Eye gaze is a reliable indicator of visual attention and provides vital cues to infer the cognitive process underlying human image understanding. Our study demonstrates that visual attention is specific to interesting objects and actions, contrary to notion that attention is subjective. The NUSEF (NUS Eye Fixation) database was acquired from undergraduate and graduate volunteers aged 18-35 years (μ=24.9, σ=3.4). The ASLT M eye-tracker was used to non-invasively record eye fixations, as subjects free-viewed image stimuli. We chose a diverse set of 1024 × 728 resolution images, representative of various semantic concepts and capturing objects at varying scale, illumination and orientation, based on quality and aspect ratio constraints. Images comprisedeveryday scenes from Flickr, aesthetic content from, Google images and emotion-evoking IAPS [18] pictures.

 If you use this database, please cite NUSEF as below:

An Eye Fixation Database for Saliency Detection in Images, R. Subramanian, H. Katti, N. Sebe, M. Kankanhalli and T.S. Chua, European Conference on Computer Vision (ECCV 2010), Heraklion, Greece, Sep 2010.


Copyright © 2012 Multimedia Analysis and Synthesis Lab