Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

R. Carmi, L. Itti, Disentangling topdown from bottom up influences on attentional allocation in dynamic scenes, In: Proc. 11th Joint Symposium on Neural Computation (JSNC'04), Los Angeles, California, May 2004. (Cited by 4)

Abstract: Motivation: Attentional allocation is determined by the interplay between bottom-up and top-down influences. Here we try to quantify the relative contributions of different influences on attentional allocation in dynamic scenes, as well as examine how they change over time. Methods: In order to manipulate the availability of top-down influences on attentional allocation, heterogeneous video clips were cut into clippets (M=2s), which were scrambled and re-assembled into MTV-style clips. Two groups of 8 Subjects each were instructed to ``follow the main actors and actions.'' One group viewd the original stimuli while the other group viewd the MTV-style clips. Eye positions were recorded using an ISCAN eye-tracker (240Hz, yielding a total of more than a million samples for each group), and segmented into saccades, blinks, and fixation/smooth pursuit periods. A saliency-based model of attention capture (Itti & Koch 2000) was used to probe the relative contribution of bottom-up influences on attentional allocation based on a novel performance metric - Chance-Adjusted Saliency Accumometric (CASA). CASA values were computed based on the weighted sum of differences between normalized saliency at human vs. random saccade targets. Results: Total CASA based on the full saliency model was 6 percent higher in the MTV group compared to the original group. In both original and MTV groups, CASA based on either motion or flicker features alone was 95 percent of the CASA based on the full saliency model. CASA based on either color, intensity, or orientation features alone was 66 percent of the full model CASA. Generally, CASA values for earlier saccades after stimulus onset (clip or clippet start) were higher than for later saccades, but tapered off and flactuated around a fairly high value after the first several saccades. Conclusions: The 6 percent CASA difference between the original and MTV groups shows that eliminating visual context beyond the first 2s of viewing barely increased the overall relative weight of bottom-up influences on attentional allocation. Our results imply that the relative weight of top-down influences on attentional allocation in dynamic scenes does not increase with viewing time (beyond the first 2s). We also found that either motion or flicker are 150 percent stronger than either color, intensity, or orientation as bottom-up attractors of attention.

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Model of Top-Down Attentional Modulation, Human Psychophysics, Human Eye-Tracking Research