We present in this page a few results obtained with the computer simulation implementing our model of bottom-up, saliency-based visual attention. The trajectory followed by the focus of attention is visualised by a red broken line, with arrows indicating the sequential order in which the various locations are attended. The focus of attention is represented by a yellow circle. Because the saliency map is encoded at a coarse spatial scale, the exact location of the center of the focus of attention may appear slightly off an object of interest, although it is actually the object that has been detected.
This model is under permanent development. This means that new computational strategies are experimented almost every day. These images may consequently exhibit attentional trajectories slightly different from what you might have previously seen created with an earlier version of the model. All the images in this page were created with the same version of our model. No tuning was introduced.
Images similar to these ones were used to test for expected behaviors of the model. For example, objects of similar shape but with different contrast or color to the background are sequentially attended in order of decreasing contrast or color, which in these cases are the only cues for saliency.
A noisy version of the classical 'pop-out' and 'conjunctive search' psychophysical experiments proposed by A. Triesman was simulated with our model. When a target object (here a bar) can be distinguished from some distractors (different bars) by one or more unique attributes (e.g. color or orientation), the measured time required by humans to find the target is nearly independent of the number of distractors in the image. This was verified by our model. However, when the target can only be distinguished from the distractors by a conjunction of attributes (e.g. in the right image below, the target is the only bar which is at the same time red and oriented like the green bars), the time required by humans to find the target increases linearly with the number of distractors. This result was also verified by our model.
A first set of natural scenes studied consists of finding a red can in environments exhibiting strong and distracting contrasts in luminance, chrominance, and orientation.
We experimented how the addition of strong speckle noise could influence the performance of the model. In these examples, an important amount of noise could be added before observing a degradation in performance, as long as this noise did not directly interact with the target (e.g. here, in the red-green channel). When the noise was interfering directly with the target, however, the saliency of the target was decreased (e.g. here, when numerous locations in the image become red, the red target is no more particularly conspicuous in the image).
Performance of the model is difficult to quantitatively evaluate with natural images. However, we obtained in general good agreement between our personal perception of the salient locations in an image and the attentional trajectory generated by the model.
Comparison between model search times and the average search time of 62 human observers was performed on the ``Search_2'' dataset of high-resolution (6144x4096) images from Lex Toet at TNO-Human Factors Research Institute in the Netherlands. In the examples shown here, the model found the target immediately (first or second attended location), outperforming humans. Estimating the number of locations visited by humans from their average search time, our model appeared to outperform humans (i.e., find the target after fewer attentional shifts) in 75% of the images studied.
Copyright © 2000 by the University of Southern California, iLab and Prof. Laurent Itti