Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

L. Itti, Keynote lecture: Computational modeling of bottom-up and top-down visual attention in complex dynamic environments, European Conference on Eye Movements (ECEM 2015), Vienna, Austria, Aug 2015.

Abstract: Visual attention and eye movements in primates have been widely shown to be guided by a combination of stimulus-dependent or 'bottom-up' cues, as well as task-dependent or 'top-down' cues. Both the bottom-up and top-down aspects of attention and eye movements have been modeled computationally. Yet, is is not until recent work which I will describe that bottom-up models have been strictly put to the test, predicting significantly above chance the eye movement patterns, functional neuroimaging activation patterns, or most recently neural activity in the superior colliculus of human or monkey participants inspecting complex static or dynamic scenes. In recent developments, models that increasingly attempt to capture top-down aspects have been proposed. In one system which I will describe, neuromorphic algorithms of bottom-up visual attention are employed to predict, in a task-independent manner, which elements in a video scene might more strongly attract attention and gaze. These bottom-up predictions have more recently been combined with top-down predictions, which allowed the system to learn from examples (recorded eye movements and actions of humans engaged in 3D video games, including flight combat, driving, first-person, or running a hot-dog stand that serves hungry customers) how to prioritize particular locations of interest given the task. Pushing deeper into real-time, joint online analysis of video and eye movements using neuromorphic models, we have recently been able to predict future gaze locations and intentions of future actions when a player is engaged in a task. In a similar approach where computational models provide a normative gold standard against a particular individual's gaze behavior, machine learning systems have been demonstrated which can predict, from eye movement recordings during 15 minutes of watching TV, whether a person has ADHD or other neurological disorders. Together, these studies suggest that it is possible to build fully computational models that coarsely capture some aspects of both bottom-up and top-down visual attention.

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Model of Top-Down Attentional Modulation, Computational Modeling