Aug 30, 2013 Presenter 1 : Ali Presenter 2 : John Lunch : John Sep 06, 2013 Presenter 1 : Chen Presenter 2 : Nader Lunch : Nader Sep 13, 2013 Presenter 1 : Farhan Presenter 2 : Jiaping Lunch : Shane
See here if you want to modify the schedule.
Please check the iLab Forum for current events and discussions.
Most of the action on this web site is around our C++ Neuromorphic Vision Toolkit, updated on a daily basis, and around our publication server.
June, 2010: New paper Of bits and wows: A Bayesian theory of surprise with applications to attention by P. F. Baldi, L. Itti came out in Neural Networks.
Brief Summary: What makes some information more interesting, worthy of our attention, or surprising? The concept of surprise is central to our everyday life, yet, no widely-accepted mathematical theory currently exists to measure surprise. One often talks about the ``wow factor'' but this notion has remained an imprecise, qualitative one: is winning the lottery more surprising than 9/11, and by how much? So far, scientists and mathematicians have not been able to provide an exact answer.
Here we propose a formal Bayesian definition of surprise that is the only consistent formulation under a minimal set of mathematical axioms. Surprise quantifies how data affects a natural or artificial observer, by measuring the difference between posterior (after an event was observed) and prior (before the event) beliefs of the observer. This facet of information is important in dynamic situations where beliefs change, in particular during learning and adaptation. Using this framework we measure the extent to which humans look towards surprising things while watching television and video games. For this, we build a computer vision neural network architecture capable of computing surprise over images and videos. Hypothesizing that surprising data ought to attract people's attention, the output of this architecture is used in a psychophysical experiment to analyze human eye movements when people watch videos. Surprise is found to yield robust performance at predicting human gaze, that is, volunteers in the experiment were strongly attracted towards those objects and events in the videos which the new theory predicted are the most surprising. The resulting theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction.
June, 2010: New paper A Bayesian model of efficient visual search and recognition by L. Elazary, L. Itti came out in Vision Research.
Brief Summary: Humans are very efficient at finding known targets in a visual scene, despite the limited field of view that the fovea provides. When searching for our car in a crowded parking lot, we do not systematically examine every location in a grid pattern. Instead, we examine particular locations that our brain deemed worthy to look at and concentrate our visual resources on. How this attentional process takes place has long been thought to include the interaction between the raw visual features extracted from the retina (bottom-up processes) and our known model of the target (top-down processes). However, the exact details of this process have eluded scientists for years.
In this research we model the interaction between the top-down and the bottom-up process to create a system capable of efficient search and recognition. Inspired from the way real biological neurons work, the model shapes the properties of its individual feature detectors to have a greater respond when the target is present. This results in a visual map with hot spots, which correspond to the likeliest location of the object of interest. The model is laid in a Bayesian framework, which enables it to learn how to shape the feature detectors, as well as to recognize the object based on the detector's response. This is achieved by learning the likelihood of salient features of various objects and then using these likelihoods to compute a probable location of objects during a search task. The result of this work is a single computationally efficient system which provides dual use. When given a location in a visual scene, the system is able to identify the object under that location. Alternatively, given a description of a known object, the system is able to produce possible locations for this object in the scene.
The system was shown to work on various day to day objects as well as satellite images (117,174 images of 1147 objects were tested, and 40 satellite images). Since satellite images contain a lot of data, it is often difficult for humans to quickly find places of interest in these images. In this task, the model was set to find images containing houses, so that humans can determine what to do with these regions (provide food, estimate the disaster area, etc.). Other applications can also involve the blind and the visual impaired, by helping them find day to day objects.
February 2010: New paper Training top-down attention improves performance on a triple-conjunction search task. by F. Baluch, L. Itti came out in PLoS One.
Brief Summary: Whether its searching for those lost keys in a drawer, a favourite yellow top in the walk-in closet or the block of exotic cheese in the refrigerator, humans conduct visual search tasks on a daily basis. The brain mechanisms that enable this ability to find what we are looking for, rely heavily on directing attention to the right parts of space and the right features e.g. when looking for bananas one would look in likely locations for a yellow longish structure.
In this study we tested whether humans can improve their ability to quickly capture the essence of a complex visual target with 3 distinct features, and later find these targets on a cluttered display. We found that humans are able to improve their ability to guide their attention to items on the display that matched the features of the target. A critical manipulation in our experiment was that we challenged our subjects with a new target on every one of the 1000 trials conducted. Therefore our results demonstrated a general improvement in attentional allocation to features that match the target rather than a rote memorization of a specific set of items. Our results also show that when finding items certain features play a stronger role than others. In our case color served as a strong cue to finding a particular target.
An understanding of these mechanisms of visual search informs us of the manner in which we might go about doing our daily visual searches for many different items each day. Further, the results from this work could help improve expertise acquisition training programs for image analysts such as radiologists, airport security staff and defense teams analyzing satellite imagery by providing quantitative measures of performance on their respective tasks and suggesting modifications of an image to better guide the analysts’ attention to important target defining features.
Browse our collection of image databases here.
Copyright © 2000 by the University of Southern California, iLab and Prof. Laurent Itti