Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

L. Itti, E. Niebur, J. Braun, C. Koch, A Trainable Model of Visual Attention, In: Proc. Society for Neuroscience Annual Meeting (SFN'96), p. 270, Nov 1996. (Cited by 37)

Abstract: We present a model of bottom-up selective visual attention, developed in accordance with the known physiology of the visual system of macaque monkeys and humans. The model comprises two interacting stages, the first being a fast and parallel pre-attentive extraction of visual features (orientation, intensity and color, at several spatial scales), and the second a slow and sequential focal attention shifting mechanism (Winner-Take-All neural network for the selection of the most conspicuous image location, and inhibition-of-return mechanism to generate attentional shifts). The link between the two stages is a ``saliency map'', which topographically encodes for the local conspicuity in the visual scene, and controls where the focus of attention is currently deployed [Koch and Ullman, Human Neurobiol. 1985;4:219-227]. Supervized learning can be introduced to bias the relative weights of the features in the construction of the saliency map and achieve some degree of specialization towards target detection tasks. Despite its simplicity, this model has demonstrated interesting performances in reproducing human stimulus-driven task independent attention. Results with the model are comparable to humans' on simple psychophysical tasks (e.g. A. Treisman's pop-out and conjunctive search). Good performance was also obtained in the detection of salient targets in natural color images, despite high noise, large variations in color and illumination, shadows, reflections and strong textures, which are reputed problematic for artificial vision systems (an interactive demonstration may be found at http://www.klab.caltech.edu/~itti/).

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Computational Modeling