Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

T. N. Mundhenk, C. Landauer, K. Bellman, M. A. Arbib, L. Itti, Teaching the computer subjective notions of feature connectedness in a visual scene for real time vision, In: Proc. SPIE Conference on Intelligent Robots and Computer Vision XXII: Algorithms, Techniques, and Active Vision, (D. P. Casasent, E. L. Hall, J. Roning Ed.), Vol. 5608, pp. 136-147, Bellingham, WA:SPIE Press, Oct 2004. (Cited by 3)

Abstract: We discus a tool kit for usage in scene understanding where prior information about targets is not necessarily understood. As such, we give it a notion of connectivity such that it can classify features in an image for the purpose of tracking and identification. The tool VFAT (Visual Feature Analysis Tool) is designed to work in real time in an intelligent multi agent room. It is built around a modular design and includes several fast vision processes. The first components discussed are for feature selection using visual saliency and Monte Carlo selection. Then features that have been selected from an image are mixed into useful and more complex features. All the features are then reduced in dimension and contrasted using a combination of Independent Component Analysis and Principle Component Analysis (ICA/PCA). Once this has been done, we classify features using a custom non-parametric classifier (NPclassify) that does not require hard parameters such as class size or number of classes so that VFAT can create classes without stringent priors about class structure. These classes are then generalized using Gaussian regions which allows easier storage of class properties and computation of probability for class matching. To speed up to creation of Gaussian regions we use a system of rotations instead of the traditional Psuedo-inverse method. In addtion to discussing the structure of VFAT we discuss training of the current system which is relatively easy to perform. ICA/PCA is trained by giving VFAT a large number of random images. The ICA/PCA matrix is computed by features extracted by VFAT. The non-parametric classifier NPclasify it trained by presenting it with images of objects having it decide how many objects it thinks it sees. The difference between what it sees and what it is supposed to see in terms of the number of objects is used as the error term and allows VFAT to learn to classify based upon the experimenters subjective idea of good classification.

Themes: Model of Bottom-Up Saliency-Based Visual Attention, Beobots, Computational Modeling, Computer Vision