Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

V. Navalpakkam, L. Itti, Towards a Unified Model for Attention and Recognition, In: Proc. Society for Neuroscience Annual Meeting (SFN'03), Nov 2003. (Cited by 49)

Abstract: Primate vision recruits at least three main components, namely, bottom-up attention, top-down attention and object recognition. The first component is used to direct gaze towards visually salient locations in a scene, while the second is used to direct gaze towards the task-relevant targets, and the third is used to recognize objects. Most modelling efforts treat these problems independently, and consequently, they use different sets of low level features and distinct computational strategies. Our concern against such approaches is related to the constraints on the available resources: how can one visual system, such as ours, accomodate multiple low level visual subsystems? This motivates us to investigate whether and how bottom-up attention, top-down attention and object recognition may share resources and be related. Towards this end, we designed, implemented and tested our UNARE model for UNified Attention and REcognition. Our model uses the same low level features to find visually salient objects, to learn object representations, to detect task-relevant target objects, and, to recognize them. We achieve top-down attention by biasing the bottom-up attentional system with the learned target representation. To recognize the object at the current attended location, we match its features against the learned representations. We tested our model on 343 images that ranged from artificial images of geometrical objects to natural images containing objects such as soda cans, and various signs in diverse backgrounds. On an average, our model significantly accelerated attention towards the target and detected it 1.8-16.4 times faster than the bottom-up attention model even in scenes with poor resolution or significant noise or clutter and complex backgrounds. There were few false positives (0-10 percent) and false negatives (0-21 percent) in the recognition. Overall, the performance of our model was remarkable given its simplicity, and seems to suggest that bottom-up attention, top-down attention and object recognition may share resources extensively and be intimately related.

Themes: Computational Modeling, Human Psychophysics, Model of Bottom-Up Saliency-Based Visual Attention, Scene Understanding