Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

L. Itti, Models of Bottom-Up and Top-Down Visual Attention, California Institute of Technology, Jan 2000. [Ph.D. Thesis] (Cited by 422)

Abstract: When we observe our visual environment, we do not perceive all its components as being equally interesting. Some objects automatically and effortlessly ``pop-out'' from their surroundings, that is, they draw our visual attention, in a `bottom-up'' manner, towards them. In a first approximation, focal visual attention acts as a rapidly shiftable ``spotlight,'' which allows only the selected information to reach higher levels of processing and representation. Most models of the bottom-up control of attention are based on the concept of a saliency map, that is, an explicit two-dimensional map that encodes the conspicuity of objects in the visual environment. Competition among neurons in this map gives rise to a single winning location that corresponds to the next attended target. Inhibiting this location automatically allows the system to attend to the next most salient location. A first body of work in this thesis describes a detailed computer implementation of such a scheme, focusing on the problem of combining information across modalities, here orientation, intensity and color information, in a purely stimulus-driven manner. The model is applied to common psychophysical stimuli as well as to very demanding visual search tasks. Its successful performance is used to address the extent to which the primate visual system carries out visual search via one or more such saliency maps and how this can be tested. We next address the question of what happens once our attention is focused onto a restricted part of our visual field. There is mounting experimental evidence that attention is far more sophisticated than a simple feed-forward spatially-selective filtering process. Indeed, visual processing appears to be significantly different inside the attentional spotlight than outside. That is, in addition to its properties as a feed-forward information processing and transmission bottleneck, focal visual attention feeds back and locally modulates, in a ``top-down'' manner, the visual processing and representation of selected objects. The second body of work presented in this thesis is concerned with a detailed computational model of basic pattern vision in humans and its modulation by top-down attention. We start by acquiring a complete dataset of five different simple psychophysical experiments, including discriminations of contrast, orientation and spatial frequency of simple pattern stimuli by human observers. This experimental dataset places strict constraints on our model of early pattern vision. The model, however, is eventually able to reproduce the entire dataset while assuming plausible neurobiological components. The model is further applied to existing psychophysical data which demonstrates how top-down attention alters performance in these simple psychophysical discrimination experiments. Our model is able to quantitatively account for all observations by assuming that attention strengthens the non-linear cortical interactions among visual neurons. Together, the two aspects of attention studied in this thesis lead us to consider the essential role of non-linear computations in visual processing. We suggest that visual processing, even at its earliest levels, is best characterized not by linear response functions and spatial convolutions, but rather by non-linearly interacting computational devices.

Keywords: Visual Attention ; Bottom-Up ; Top-Down ; Modeling ; Spatial Vision ; Human Psychophysics ; Neural Networks ; Automatic Target Recognition (ATR) ; Visual Search ; Eye Movements

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Model of Top-Down Attentional Modulation, Computational Modeling, Human Psychophysics, Computer Vision