Abstract: Are we more likely to consciously register, remember, and report elements in a visual scene which are more conspicuous or salient? Focal visual attention is known to gate low-level visual information into higher-level processing, short-term memory, and consciousness. However, being merely attracted to something salient and attending to it does not guarantee that it will be retained in the conscious mental representation of a scene. Here we provide preliminary experimental evidence that, in dynamic natural scenes, out of all objects, actors, and actions which are attended to, those which are verbally reported are also more bottom-up salient than those which are not. Using an eye-tracker, we recorded gaze of one human participant watching twelve 30-seconds television clips, together with his online verbal descriptions of the scenes depicted in the clips. Eye movement traces were segmented into periods of fixation and saccadic gaze shifts. We manually isolated saccades towards each entity that had been reported verbally. Using a computational model of bottom-up visual salience, we computed dynamic salience maps for all clips. We compare the distribution of instantaneous salience at human saccade targets to that at random targets using the Kullback-Leibler (KL) distance; KL scores above zero indicate that visual salience attracted gaze more than expected by chance. Our findings are three-fold: first, visual salience significantly attracted gaze overall, as scene locations saccaded to by the observer were reliably more salient than expected by chance (KL=0.194+/-0.019, n=992 saccades, t-test, p<10^-27). Second, restricting the analysis only to human saccades directed towards objects, actors, or actions mentioned in the verbal report yielded an even higher score (KL=0.372+/-0.055, n=319, p<10^-13), indicating that more salient scene elements were more likely to be reported. Third, restricting the analysis to only the first saccade onto each of the 88 different reported scene elements yielded an even higher score (KL=0.546+/-0.120, n=88, p<0.0008), suggesting that instantaneous salience of a scene element when first gazed at may significantly influence whether it will be reported. In sum, our study suggests that, out of all the targets of human gaze over complex dynamic scenes, those which emerge as the central elements in the conscious representation of the scene are more bottom-up salient, supporting a role for bottom-up salience in facilitating entry into conscious mental scene representations.

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Computational Modeling, Scene Understanding, Human Eye-Tracking Research


