Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

S. Egner, L. Itti, C. R. Scheier, Comparing attention models with different types of behavior data, In: Investigative Ophthalmology and Visual Science (Proc. ARVO 2000), Vol. 41, No. 4, p. S39, Mar 2000. (Cited by 17)

Abstract: Purpose: While looking at an image, a human observer generates a sequence of attentional shifts between different image locations. Different models of visual attention attempt to predict these shifts. The goal of our ongoing project is to evaluate different models by comparing their predicted shifts of attention to the shifts produced by human observers. Methods: The main challenge for our study is that attention models typically predict covert shifts of attention, which can not be measured directly from an observer's behavior. What can be measured, for instance eye movements, is always a result of response-specific (e.g. the oculo-motor system) and non-specific (e.g. attentional) factors. To infer the non-response-specific factors, we recorded different types of responses, eye movements, finger pointing, and mouse clicks for the same stimuli. Each stimulus, a search/pop-out display, a natural scene or a web page, was presented for four seconds. Responses that highly correlated between different modalities were assumed to reflect attentional processes. The behavior data was transformed into image coordinates where it could be compared the models' predictions. We used a local feature contrast based saliency measurement as a baseline model and the model by Itti and Koch (1998). Other models can be integrated in the same way. We computed how well the distribution of responses from one response system could predict the responses from another response system or from a model. Results: (1) Distributions produced by different response systems are highly correlated. (2) The similarity between responses and model predictions strongly depends on the stimulus category, but for all stimulus categories, the model by Itti and Koch produces better predictions than the baseline model. Conclusions: The high similarity between response modalities indicates that the responses reflect a common underlying, presumably attentional, process. We suggest that mouse clicks are a particularly easy way to gather attentional data. The model by Itti and Koch is favorable over the baseline model. Some model improvements that would lead to an even higher agreement with our empirical data are discussed.

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Computational Modeling, Human Psychophysics