iLab Neuromorphic Vision C++ Toolkit Overview

What is Neuromorphic Vision?

In recent years, a new discipline has emerged which challenges classical approaches to engineering and computer vision research, that of neuromorphic engineering. This new research effort promises to develop engineering systems with unparalleled robustness and on-line adaptability. Such systems are based on algorithms and techniques inspired from and closely replicating the principles of information processing in biological nervous systems. Their applicability to engineering challenges is widespread, and includes smart sensors, implanted electronic devices, autonomous visually-guided robotics systems, prosthesis systems, and robust human-computer interfaces.

Neuromorphic engineering proposes to fill the gap between, on the one hand, computational neuroscience, and, on the other hand, traditional engineering. Computational neuroscience has yielded very useful models and theories of brain function, but too often these have been restricted to simplified conditions, stimuli and tasks, to allow direct comparison with simple empirical measurements on biological systems. Also, computational neuroscience models typically are concerned with testing one given hypothesis, and thus are not intended to solve real-world problems, but rather to advance our understanding of brain function through hypothesis testing. Consequently, computational neuroscience models often tend to not scale-up to more complex stimuli, environments, or task conditions. In contrast, engineering has focused on developing systems that can solve actual real-world problems; however, because general problems such as recognizing objects in a digital image or driving a vehicle from one city to another are incredibly complex, often engineering solutions have been also explicitly restricted to simplified environments and tasks (e.g., start by recognizing letters on a page of text before attacking the broader problem of general object recognition, or by driving a robot in a corridor of known width before driving it on any indoors or outdoors terrain). Because many animals can solve problems like object recognition or basic navigation in unconstrained environments, the promise is that developing full-scale engineering systems based on biological information processing principles may provide a new avenue for transcending the limitations of both traditional computational neuroscience and traditional engineering.

What is the goal of the toolkit?

A neuromorphic robot

Because of its truly interdisciplinary nature, benefiting from the latest advances in experimental and computational neuroscience, electrical engineering, control theory, and signal and image processing, neuromorphic engineering a very complex field. Thus, one motivation for the development of a Neuromorphic Vision Toolkit is to provide a set of basic tools which can assist newcomers in the field with the development of new models and systems.

More generally, the iLab Neuromorphic Vision C++ Toolkit project aims at developing the next generation of vision algorithms, closely architectured after the neurobiology of the primate brain rather than being specifically developed for given environmental conditions or tasks. To this end, it provides a software foundation that is specifically geared towards the development of neuromorphic models and systems.

Briefly, what are the main high-level components of the toolkit?

At the core of the toolkit are a number of neuroscience models, initially developed to provide greater understanding of biological vision processing, but here made ready to be applied to engineering challenges such as visually-guided robotics in outdoor environments. Taken together, these models provide general-purpose vision modules that can be easily reconfigured and tuned for specific tasks. The gross driving architecture for a general vision system at the basis of many of the modules available in the toolkit is shown in the figure (also see our recent paper on this topic).

Input video captured by camera or from other sources is first processed by a bank of low-level visual feature detectors, sensitive to image properties such as local contrast, orientation or motion energy. These feature detectors mimic the known response properties of early visual neurons in the retina, lateral geniculate nucleus of the thalamus, and primary visual cortex. Subsequent visual processing is then split into two cooperating streams: one is concerned with the rapid computation of the ``gist'' and layout of the scene, and provides coarse clues by which the system obtains a sense of the environmental conditions (e.g., indoors vs. outdoors, on a track vs. off-road) and of its position within the environment (e.g., path is turning left, the scene is highly cluttered). The second stream is concerned with orienting attention and the eyes towards the few most visually conspicuous objects in the scene. This stage relies on a neural saliency map, which gives a graded measure of ``attractiveness'' to every location in the scene and is modeled after the neural architecture of posterior parietal cortex in the monkey brain. At any given point in time, the system uses the gist for basic orienting in the scene, and sequentially attends to interesting objects (which could be obstacles, landmarks to aid navigation, or target objects being looked for). Several neural models are available in the toolkit for the implementation of the next processing stage, concerned with identifying the object that has drawn attention and the eyes, and most of these models are inspired by the visual response properties of neurons in infero-temporal cortex. Finally, additional modules are available for short-term and long-term memory, cognitive knowledge representation, and modulatory feedback from a high-level task definition (e.g., look for the stop sign) to the low-level visual processing (e.g., emphasize the contribution of red to the saliency map, prime the object recognition module for the ``traffic sign'' object class).

Not all of the components shown in the figure have been fully implemented, and many are at a very preliminary stage of development, some being simply not yet in existence. The interesting point to note already at this stage, however, is how the biologically-inspired visual system architecture proposed here is very different from typical robotics vision and computer vision systems, usually defined to solve a specific problem (e.g., find a stop sign by looking for its specific shape using an algorithm matched to its exact geometrical properties). This promises to make the systems developed around this architecture particularly capable when dealing with novel complex outdoors scenes and unexpected situations, as has been widely demonstrated by, for example, our model of bottom-up attention.

Architecture overview (click to enlarge)