|
iLab Neuromorphic Vision C++ Toolkit Overview
What is Neuromorphic Vision?
In recent years, a new discipline has emerged which challenges
classical approaches to engineering and computer vision research, that
of neuromorphic engineering. This new research effort
promises to develop engineering systems with unparalleled robustness
and on-line adaptability. Such systems are based on algorithms and
techniques inspired from and closely replicating the principles of
information processing in biological nervous systems. Their
applicability to engineering challenges is widespread, and includes
smart sensors, implanted electronic devices, autonomous
visually-guided robotics systems, prosthesis systems, and robust
human-computer interfaces.
Neuromorphic engineering proposes to fill the gap between, on the
one hand, computational neuroscience, and, on the other hand,
traditional engineering. Computational neuroscience has yielded very
useful models and theories of brain function, but too often these have
been restricted to simplified conditions, stimuli and tasks, to allow
direct comparison with simple empirical measurements on biological
systems. Also, computational neuroscience models typically are
concerned with testing one given hypothesis, and thus are not intended
to solve real-world problems, but rather to advance our understanding
of brain function through hypothesis testing. Consequently,
computational neuroscience models often tend to not scale-up to more
complex stimuli, environments, or task conditions. In contrast,
engineering has focused on developing systems that can solve actual
real-world problems; however, because general problems such as
recognizing objects in a digital image or driving a vehicle from one
city to another are incredibly complex, often engineering solutions
have been also explicitly restricted to simplified environments and
tasks (e.g., start by recognizing letters on a page of text before
attacking the broader problem of general object recognition, or by
driving a robot in a corridor of known width before driving it on any
indoors or outdoors terrain). Because many animals can solve problems
like object recognition or basic navigation in unconstrained
environments, the promise is that developing full-scale engineering
systems based on biological information processing principles may
provide a new avenue for transcending the limitations of both
traditional computational neuroscience and traditional
engineering.
What is the goal of the toolkit?
A neuromorphic robot |
Because of its truly interdisciplinary nature, benefiting from the
latest advances in experimental and computational neuroscience,
electrical engineering, control theory, and signal and image
processing, neuromorphic engineering a very complex field. Thus, one
motivation for the development of a Neuromorphic Vision Toolkit is to
provide a set of basic tools which can assist newcomers in the field
with the development of new models and systems.
More generally, the iLab Neuromorphic Vision C++ Toolkit project
aims at developing the next generation of vision algorithms, closely
architectured after the neurobiology of the primate brain rather than
being specifically developed for given environmental conditions or
tasks. To this end, it provides a software foundation that is
specifically geared towards the development of neuromorphic models and
systems.
|
Briefly, what are the main high-level components of the toolkit?
At the core of the toolkit are a number of neuroscience models,
initially developed to provide greater understanding of biological
vision processing, but here made ready to be applied to engineering
challenges such as visually-guided robotics in outdoor environments.
Taken together, these models provide general-purpose vision modules
that can be easily reconfigured and tuned for specific tasks. The
gross driving architecture for a general vision system at the basis of
many of the modules available in the toolkit is shown in the figure
(also see our recent paper on
this topic). |
Input video captured by camera or
from other sources is first processed by a bank of low-level visual
feature detectors, sensitive to image properties such as local
contrast, orientation or motion energy. These feature detectors mimic
the known response properties of early visual neurons in the retina,
lateral geniculate nucleus of the thalamus, and primary visual cortex.
Subsequent visual processing is then split into two cooperating
streams: one is concerned with the rapid computation of the ``gist''
and layout of the scene, and provides coarse clues by which the system
obtains a sense of the environmental conditions (e.g., indoors
vs. outdoors, on a track vs. off-road) and of its position within the
environment (e.g., path is turning left, the scene is highly
cluttered). The second stream is concerned with orienting attention
and the eyes towards the few most visually conspicuous objects in the
scene. This stage relies on a neural saliency map, which gives a
graded measure of ``attractiveness'' to every location in the scene
and is modeled after the neural architecture of posterior parietal
cortex in the monkey brain. At any given point in time, the system
uses the gist for basic orienting in the scene, and sequentially
attends to interesting objects (which could be obstacles, landmarks to
aid navigation, or target objects being looked for). Several neural
models are available in the toolkit for the implementation of the next
processing stage, concerned with identifying the object that has drawn
attention and the eyes, and most of these models are inspired by the
visual response properties of neurons in infero-temporal cortex.
Finally, additional modules are available for short-term and long-term
memory, cognitive knowledge representation, and modulatory feedback
from a high-level task definition (e.g., look for the stop sign) to
the low-level visual processing (e.g., emphasize the contribution of
red to the saliency map, prime the object recognition module for
the ``traffic sign'' object class).
Not all of the components shown in the figure have been fully
implemented, and many are at a very preliminary stage of development,
some being simply not yet in existence. The interesting point to note
already at this stage, however, is how the biologically-inspired
visual system architecture proposed here is very different from
typical robotics vision and computer vision systems, usually defined
to solve a specific problem (e.g., find a stop sign by looking for its
specific shape using an algorithm matched to its exact geometrical
properties). This promises to make the systems developed around this
architecture particularly capable when dealing with novel complex
outdoors scenes and unexpected situations, as has been widely
demonstrated by, for example, our model of bottom-up attention.
|
Architecture overview (click to enlarge) |
|