Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

A. Borji, Hamed R. Tavakoli, D. N. Sihite, L. Itti, Analysis of scores, datasets, and models in visual saliency prediction, In: Proc. International Conference on Computer Vision (ICCV), Sydney, Australia, Dec 2013. [2013 acceptance rate (oral presentation): 2.52%] (Cited by 269)

Abstract: Models of visual saliency have become important components of many vision systems for several applications. Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confounding factors. In this study, we pursue a critical and quantitative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accuracy. We quantitatively compare 32 state-of-the-art models (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scanpath sequence. We also account for the role of map smoothing. We find that, although model rankings vary, some (e.g., AWS, AIM, and HouNIPS) consistently perform higher than the other models over all datasets. Some models work well for prediction of both fixation locations and scanpath sequence (e.g., Judd, GBVS). Our results show low prediction accuracy of models over emotional stimuli from the NUSEF dataset. Our last benchmark, for the first time, gauges the ability of models to decode the stimulus category from statistics of fixations, saccades, and model salience values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling.

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Computational Modeling, Computer Vision