In the modern big data
world, development of a huge number of physical or
social sensors provides us a great opportunity to
explore both cyber and physical situation
awareness. However, social sensor fusion for
situation awareness is still in its infancy and
lacks a unified framework to aggregate and
composite real-time media streams from diverse
sensors and social network platforms. We propose a
new paradigm where sensor and social information
are fused together facilitating event detection or
customized services. Our proposal consists of 1) a
tweeting camera framework where cameras can tweet
event related information; 2) a hybrid social
sensor fusion algorithm utilizing
spatio-temporal-semantic information from
multimodal sensors and 3) a new
social-cyber-physical paradigm where human and
sensors are collaborating for event fusion.
Tweeting cameras for
event detection, Y. Wang and M. S.
Kankanhalli, In Proceedings of the 24th
International Conference on World Wide Web,
WWW ’15, pages 1231–1241, 2015.
Socializing multimodal
sensors for information fusion, Y.
Wang, In Proceedings of the 23rd ACM
International Conference on Multimedia, MM ’15,
pages 653–656, 2015. (Doctoral
Symposium Best Paper Award).
Real-time Photography Assistance Using
Social Media
In this work we have
developed a photography model based on machine
learning which can assist a user in capturing
high quality photographs. As scene composition
and camera parameters play a vital role in
aesthetics of a captured image, the proposed
method addresses the problem of learning
photographic composition and camera parameters.
Further, we observe that context is an important
factor from a photography perspective, we
therefore augment the learning with associated
contextual information. The proposed method
utilizes publicly available photographs along
with social media cues and associated
metainformation in photography learning. We
define context features based on factors such as
time, geolocation, environmental conditions and
type of image, which have an impact on
photography. We also propose the idea of
computing the photographic composition basis,
eigenrules and baserules, to support our
composition learning. The proposed system can be
used to provide feedback to the user regarding
scene composition and camera parameters while
the scene is being captured. It can also
recommend position in the frame where people
should stand for better composition. Moreover,
it also provides camera motion guidance for pan,
tilt and zoom to the user for improving scene
composition.
Real-Time Assistance in
Multimedia Capture Using Social Media,
Y. Rawat, In Proceedings of the ACM
International Conference on Multimedia,
(Doctoral Symposium),Oct
2015.
Context-Aware
Photography Learning for Smart Mobile Devices,
Y. Rawat and M. Kankanhalli, In ACM
Transactions on Multimedia Computing,
Communications and Applications, Vol. 12,
No. 1s, Oct 2015.
Context-Based
Photography Learning using Crowdsourced Images
and Social Media, Y. Rawat and M.
Kankanhalli, In Proceedings of the ACM
International Conference on Multimedia, Grand
Challenge(MM), 2014.
Decision-Theoretic Approach to Maximizing
Observation of Multiple Targets in
Multi-Camera Surveillance
This work presents a novel
decision-theoretic approach to control and
coordinate multiple active cameras for observing
a number of moving targets in a surveillance
system. This approach offers the advantages of
being able to (a) account for the stochasticity
of targets’ motion via probabilistic modeling,
and (b) address the tradeoff between maximizing
the expected number of observed targets and the
resolution of the observed targets through
stochastic optimization. One of the key issues
faced by existing approaches in multi-camera
surveillance is that of scalability with
increasing number of targets. We show how its
scalability can be improved by exploiting the
problem structure: as proven analytically, our
decision-theoretic approach incurs time that is
linear in the number of targets to be observed
during surveillance. As demonstrated empirically
through simulations, our proposed approach can
achieve high-quality surveillance of up to 50
targets in real time and its surveillance
performance degrades gracefully with increasing
number of targets. We also demonstrate our
proposed approach with real AXIS 214 PTZ cameras
in maximizing the number of Lego robots observed
at high resolution over a surveyed rectangular
area. The results are promising and clearly show
the feasibility of our decision-theoretic
approach in controlling and coordinating the
active cameras in real surveillance system.
Decision-Theoretic
Approach to Maximizing Observation of Multiple
Targets in Multi-Camera Surveillance,
P. Natarajan, T.N. Hoang, K.H. Low and M.S.
Kankanhalli, In International Conference on
Autonomous Agents and MultiAgent Systems
(AAMAS2012), pp. 155-162, Spain, June
4-8, 2012.
In this work, we present
an automated post-processing method for home
produced videos based on frame “interestingness”.
The input single video clip is treated as a long
take, and film editing operations for sequence
shot are performed. The proposed system
automatically adjusts the distribution of
interestingness, both spatially and temporally, in
the video clip. We use the idea of video
retargeting to introduce fake camera work and
manipulate spatial interestingness, then we
perform video re-projection to introduce motion
rhythm and modify the temporal distribution of
interestingness. Experimental results show the
interestingness improvement in our output video
clips.
Video Retargeting for
Aesthetic Enhancement, Y. Xiang and
M.S. Kankanhalli, ACM International
Conference on Multimedia (ACMMM 2010),
Florence, Oct 2010.
A Synaesthetic Approach for Image Slideshow
Generation
We present a novel
automatic image slideshow system that
explores a new medium of images and music.
It can be regarded as a new image selection
and slideshow composition criterion. Based
on the idea of "hearing colors, seeing
sounds" from the art of music visualization,
equal importance is assigned to image
features and audio properties for better
synchronization. We minimize the aesthetic
energy distance between visual and audio
features. Given a set of images, a subset is
selected by correlating image features with
the input audio properties. The selected
images are then synchronized with the music
subclips by their audio-visual distance. The
inductive image displaying approach has been
introduced for common displaying devices.
A Synaesthetic Approach
for Image Slideshow Generation, Y.
Xiang and M.S. Kankanhalli, IEEE
International Conference on Multimedia and
Expo (ICME2012), Melbourne, July 2012.
In recent times, the
proliferation of multimedia devices and reduced
costs of data storage have enabled people to
easily record and collect a large number of home
videos; furthermore, this collection is growing
with time. With the popularity of participatory
media such as YouTube and facebook, problems are
encountered when people intend to share their
home videos with others. The first problem is
that different people might be interested in
different video content. Given the numbers of
home videos, it is a time-consuming and hard
task to manually select proper content for
people with different interests. Secondly, as
short videos are becoming more and more popular
in media sharing applications, people need to
manually cut and edit home videos which is again
a tedious task. In this paper, we propose a
method that employs affective analysis to
automatically create video presentations from
home videos. Our novel method adaptively creates
presentations based on three properties:
emotional tone, local main character and global
main character. A novel sparsity-based affective
labeling method is proposed to identify the
emotional content of the videos. The local and
global main characters are determined by
applying face recognition in each shot. To
demonstrate the proposed method, three kinds of
presentations are created for family,
acquaintance and outsider. Experimental results
show that our method is very effective in video
sharing and the users are satisfied with the
videos generated by our method.
Affect-based Adaptive
Presentation of Home Videos, X.H.
Xiang and M.S. Kankanhalli, ACM
International Conference on Multimedia (ACMMM
2011), Scottsdale, Nov 2011.
NUSEF: National University of Singapore Eye
Fixation Database
We present
analyses and results from an eye-tracking
study to investigate "What attracts
human visual attention in semantically rich
images ?". Eye gaze is a reliable
indicator of visual attention and provides
vital cues to infer the cognitive process
underlying human image understanding. Our
study demonstrates that visual attention is
specific to interesting
objects and actions, contrary to notion that
attention is subjective. The NUSEF (NUS Eye
Fixation) database was acquired from
undergraduate and graduate volunteers aged
18-35 years (μ=24.9, σ=3.4). The ASLT M
eye-tracker was used to non-invasively record
eye fixations, as subjects free-viewed image
stimuli. We chose a diverse set of 1024 × 728
resolution images, representative of various
semantic concepts and capturing objects at
varying scale, illumination and orientation,
based on quality and aspect ratio constraints.
Images comprisedeverydayscenes
from Flickr, aesthetic
content from Photo.net, Google images and
emotion-evoking IAPS [18]
pictures.
If you use
this database, please cite NUSEF as below:
An Eye Fixation
Database for Saliency Detection in Images,
R. Subramanian, H. Katti, N. Sebe, M.
Kankanhalli and T.S. Chua, European
Conference on Computer Vision (ECCV 2010),
Heraklion, Greece, Sep 2010.