Yet Another Bulletin Board

Welcome, Guest. Please Login or Register.
04/23/14 at 16:09:42

Home Home Help Help Search Search Members Members Login Login Register Register
iLab Forum saliency to flicker


   iLab Forum
   C++ Neuromorphic Vision Toolkit
   Neuroscience Issues
(Moderators: Forum Admin, Laurent Itti)
   saliency to flicker
« Previous topic | No topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: saliency to flicker  (Read 24473 times)
Laurent Itti
YaBB Moderator
YaBB God
*****




iLab rocks!

   
WWW

Gender: male
Posts: 551
saliency to flicker
« on: 05/29/02 at 00:31:48 »
Quote Quote Modify Modify

sorry to break the news so abruptly, but our current FlickerChannel is a big hoax which was originally intended as a first pass on exploiting dynamic information in the saliency model.
 
apart from real motion detectors, implemented for example using an approximation to the spatiotemporal energy model of Adelson & Bergen, ideally all features should respond to flicker and on/off events, not only the intensity feature as is currently the case in the FlickerChannel implementation.
 
The rationale for endowing every channel with some computation equivalent to what we have in FlickerChannel is the typically observed stronger transient response of visual neurons to the onset/offset of a stimulus compared to their sustained response while the stimulus remains present. Seee, e.g., Kuffler (1953) in the LGN or Hubel & Wiesel (1962) in V1.
 
I would like to propose a modification of the SingleChannel class so that it contains two pyramids (for the current frame and the previous frame), and so that the centerSurround computation returns a sum of the center-surround computed on the current frame, plus some coefficient times the result of center-surround applied to the difference between current and previous pyramid levels (for the same center and surround levels); that is:
 
CS(c, s) = abs(curr_pyr(c)-curr_pyr(s)) +
   alpha * abs( abs(curr_pyr(c)-prev_pyr(c)) -
      abs(curr_pyr(s)-prev_pyr(s)) )
 
or something along these lines (to be experimented with; e.g., the inner abs() probably could be removed).
 
How reasonable does this sound?  It should not break programs that use single frames, but should make movie processing much more realistic (e.g., a blob changing from red to isoluminant green will be detected as salient in the RedGreen channel, while it is not currently detected if truly isoluminant).
 
alternatively to storing the current and previous pyramids, we could store the current and difference between current and previous (more computation when we switch to a new frame, but less computation when we do center-surrounds). But the total computation will be equivalent if we call center-surround only once, so current and previous seems more general.
 
Even more general would be to have a vector of pyramids, the length of which we can choose at construction.
 
comments appreciated!
 
  -- laurent
Logged
Rob Peters
YaBB Senior Member
****






  rjpcal  
WWW

Gender: male
Posts: 398
Re: saliency to flicker
« Reply #1 on: 05/29/02 at 08:58:36 »
Quote Quote Modify Modify

The implementation sounds reasonable. I wonder though if it would be better to do this in a separate class from SingleChannel (maybe derived from SingleChannel?) so that we can still use the simpler implementation in cases that will have only one frame? It seems like once we're talking about having a vector of Pyramids, we're not far from being able to implement a spatiotemporal motion energy model, which would be great for its own purpose, but might impose a penalty (run-time, source-code complexity) for the single frame case. Or do you think the overhead in run-time and source-code complexity will stay low even if we introduce this new flicker functionality into the existing SingleChannel class?
 
I guess I'm thinking of e.g. the SoxChannel class... this uses an array of GaborChannel objects but doesn't use their center-surround output directly, since SoxChannel has its own kind of lateral inhibition. This means it would also have to implement flicker computation differently... seems like that might get confusing if SingleChannel (and hence GaborChannel) had flicker built in natively. Or maybe that's just programmer's fear of change.
 
Wait though, it would be a real pain to have separate SingleChannel and SingleFlickerChannel classes, because then we'd have to have parallel hierarchies of GaborChannel, GaborFlickerChannel, RedGreenChannel, RedGreenFlickerChannel, etc. That makes it fairly clear that any generic flicker functionality should go directly in SingleChannel. Maybe the goal should just be to keep the differences between flicker/non-flicker as encapsulated as possible.  
 
Are there actually neurons that respond to isoluminant red/green flicker? I thought visual sensitivity to that was quite low. Or would that just be reflected in a low alpha coefficient in your equation?
 
Cheers,
Rob
Logged
Dirk Bernhardt-Walther
YaBB Full Member
***





   
WWW

Posts: 155
Re: saliency to flicker
« Reply #2 on: 05/29/02 at 10:05:25 »
Quote Quote Modify Modify

It sounds reasonable, from a programmer's point of view, to endow SingleChannel with the ability of doing multiple frame processing. I still have a few comments on the physiological relevance:
 
We noticed now that the flicker channel shouldn't really be a channel on the same level as color, luminance, orientation etc. Flicker can certainly be detected from temporal changes in each of those other channels - with varying physiological interpretations though.
 
Color information is being processed in the parvocellular pathway, which has a very low sensitivity to temporal changes (i.e. slow threshold frequency). Temporal changes (coherent motion and blinking >flicker<) are mostly processed in the magnocellular pathway, which does not have color sensitity.
 
Flicker in the orientation channel would correspond, e.g., to a bar that is flipping between horizontal and vertical. While this is certainly a salient change, is its saliency really created from this orientation change? I don't know. Could as well be the luminance change at the positions where the ends of the bar appear and disappear. Has anybody looked into this?
 
What I want to point out here is: If we want to stay physiologically plausible, then we should watch out for the differences in treating temporal changes in different domains. I'm not so sure that we can treat them all using one generic member function. But we could have this function being virtual of course ...
If we don't care about physiological plausibility - what the heck.
 
There is another point that I want to make you sensitive for: Once we are getting started on giving certain channels special attention, we should also think about the following things:
 
- orientation sensitive cells are originally built up from center-surround cells (probably, this doesn't make much of a difference computationally thogh).
 
- edges (aka orientation channels) can be built from pretty much any feature center-surround contrast: color contrasts, texture contrasts, motion contrasts (moving random dot patterns), 3D cues (random dot stereograms)
Right now they are only built using luminance information. I guess, taking care of all this would make the code much more messy. But, hey, Homo_sapiens::Brain is messy as well, right? And isn't understanding the brain the ultimate frontier of science?
Logged
Rob Peters
YaBB Senior Member
****






  rjpcal  
WWW

Gender: male
Posts: 398
Re: saliency to flicker
« Reply #3 on: 05/29/02 at 10:23:21 »
Quote Quote Modify Modify

on 05/29/02 at 10:05:25, Dirk Walther wrote:
Flicker in the orientation channel would correspond, e.g., to a bar that is flipping between horizontal and vertical. While this is certainly a salient change, is its saliency really created from this orientation change? I don't know. Could as well be the luminance change at the positions where the ends of the bar appear and disappear. Has anybody looked into this?

Well, in my thought experiment, orientation flicker could also occur with a grating that flips orientations. In that case, there would only be luminance changes at  a lattice of points where one grating is not overlapped by the other. This would be detected only at a fine spatial scale relative to the size of the grating. Seems to me that the flipping grating would be salient overall at a coarser scale; ergo, the saliency is not just due to luminance changes (in my thought experiment).
 
Cheers,
Rob
Logged
Rob Peters
YaBB Senior Member
****






  rjpcal  
WWW

Gender: male
Posts: 398
Re: saliency to flicker
« Reply #4 on: 05/29/02 at 10:44:22 »
Quote Quote Modify Modify

on 05/29/02 at 10:05:25, Dirk Walther wrote:
What I want to point out here is: If we want to stay physiologically plausible, then we should watch out for the differences in treating temporal changes in different domains. I'm not so sure that we can treat them all using one generic member function.

I was thinking that since temporal information is important in a number of places in the code, it might be worthwhile to have a global class object that represents the temporal simulation. This object would help translate between frame numbers and simulated elapsed time (could also keep track of real CPU time to give info on how long it takes to simulate 100msec, e.g.). It would also keep track of the current frame number. We have something similar to this now in the vision executable to handle the input/ouput frame series. But, if we made this kind of object globally accessible, then we could properly model the different temporal sensitivities of different channels: we'd know the time difference represented between successive frames, and we'd have a model of the channels temporal frequency sensitivity curves, so we could do the Right Thing.
 
Currently we also have temporal simulation in Brain and SM, although I guess the time step there is different than the simulated inter-frame interval. The cleanest way to handle all that would be to drive everything from a temporal event loop... Brain, SM, the vision executable would register callbacks to say "call me in 100msec" or "call me at 0:00.250"... but this may be overkill for now.
 
I guess what I'm describing are two alternatives for a global TimeSimulator object:
  • Simplest option is a passive object, that just tracks the simulated time and frame count. Easy to implement, but all users have to "play nice" and share control over who gets to advance to the next time step, etc., and make sure that the right things get called at the right simulated times.
  • More complex option is an active object, where vision registers a callback so that it can load the next frame in 100ms, and Brain registers so that it can update the SM in 5ms. This would be more like event-driven GUI programming: your main() function would register a bunch of callbacks and then call timeSimulator.run().

 
Cheers,
Rob
Logged
Laurent Itti
YaBB Moderator
YaBB God
*****




iLab rocks!

   
WWW

Gender: male
Posts: 551
Re: saliency to flicker
« Reply #5 on: 05/29/02 at 12:41:41 »
Quote Quote Modify Modify

hey, thanks for the great comments!
 
Quote:
it would be a real pain to have separate SingleChannel and SingleFlickerChannel classes, because then we'd have to have parallel hierarchies of ...

 
right!  I think the overhead would be low if we have a parameter that determines the length of the vector of pyramids, at construction of the channel.  If that length is one, then we won't do any additional computation (except for negligible access to the pyramid through the vector, and other trivial things).
 
now the issue you raise with SoxChannel is a good one; I'll check it out before moving further, so that it can be easily integrated to the proposed scheme.
 
regarding sensitivity to red/green flicker, I'll try it out but I am pretty sure it is salient. Will send a movie over.
 
Quote:
Color information is being processed in the parvocellular pathway, which has a very low sensitivity to temporal changes (i.e. slow threshold frequency).

 
yep, that's a good point. So the way this seems to be turning out would be to have a vector of pyramids with a length that can be decided at construction, to attach a timestamp (in ms since start of simulation) to each new input(), and to have an overloadable function that computes a weight depending on the time difference between current and one of the previous frames. So for slow channels this function would have a low time constant, and conversely.
 
now regarding flicker in the orientation channel, I was thinking of applying the proposed equation to each orientation separately. So just a vertical bar appearing would generate a transient (no need for it to change orientation; if it does, it would generate two transients: e.g., from disappearing in the vertical orientation and appearing in the horizontal one). The luminance channel might be activated as well. The idea here is that if V1 cells do really respond with a stronger initial transient response than their sustained response, we can ust model that. How the model as a whole will then behave for rotating bars will have to be experimented and matched to data later?
 
Quote:
- orientation sensitive cells are originally built up from center-surround cells (probably, this doesn't make much of a difference computationally thogh).

 
yep, good point. Maybe we should consider having a true LGN and then V1 (and I'll be back soon regarding V4 and merging HMAX to the saliency model, haha). Then the orientation channel would inherit its sensitivity to transients from the underlying LGN isotropic center-surround cells endowed with transient response machinery. Sounds good but maybe overkill for now.  How about waiting a bit on that?
 
Quote:
- edges (aka orientation channels) can be built from pretty much any feature center-surround contrast: color contrasts, texture contrasts, motion contrasts (moving random dot patterns), 3D cues (random dot stereograms)

 
hum, that sounds good too but I'll have to look for some data on that. I don't remember seeing V1 cells that are tuned for red oriented bars on a green background (or similar). That opens a number of interesting doors for simple psychophysics experiments (with eye tracking of course!) Cool!  I'll ruminate on that.
 
Quote:
I was thinking that since temporal information is important in a number of places in the code, it might be worthwhile to have a global class object that represents the temporal simulation. This object would help translate between frame numbers and simulated elapsed time (could also keep track of real CPU time to give info on how long it takes to simulate 100msec, e.g.). It would also keep track of the current frame number. We have something similar to this now in the vision executable to handle the input/ouput frame series. But, if we made this kind of object globally accessible, then we could properly model the different temporal sensitivities of different channels: we'd know the time difference represented between successive frames, and we'd have a model of the channels temporal frequency sensitivity curves, so we could do the Right Thing.

 
right right! I was just thinking of that on my way to work. A simple approach would be to add a float timestamp member to the Image class, representing the time in ms since start of simulation. The image ops would simply preserve the time stamp. Things like frame grabbing would take a time stamp and assign it to the image (e.g., V4Lgrabber::grab()). Then in the channels we would know the various timestamps for the various frames in our vector of pyramids, and could compute the transients according to the temporal properties of those channels.
 
but the active version of your TimeSimulator class sounds great. We do have major scheduling problems in the beobot implementation, where so far we have had to "play it nice" as you say (i.e., the framegrabber is setting the pace and the various processing modules on the various CPUs must guarantee that they will send their results back in less than a frame interval, otherwise we start lagging behind). In some of the parallel code we have some checks that if the receiving FIFO of things to process gets too long we kill it so that we can catch up. But it would be nice to have a fully asynchronous system in which various modules could take various amounts of time (e.g., process the basic features at 30fps; then use those to do some object recognition, which will take an unknown variable amount of time; whenever an object has been recognized we are ready for the next object, using the low-level features that are available at that point).
 
ok, lots of good stuff here!  In a first step, how about:
 
- SingleChannel has a vector of pyramids with len determined at construction
- SingleChannel::input() has an additional parameter "timestamp" in ms
- in parallel with the vector of frames we keep a vector of time stamps
- SingleChannel has an additional method that computes a weight depending on inter-frame difference and a time constant that can also be set at construction
 
or we directly add a "float timestamp" to Image so that we don't have to deal with time stamps separately from the frames (separate param, separate vector, etc)?
 
best,
 
  -- laurent
Logged
Rob Peters
YaBB Senior Member
****






  rjpcal  
WWW

Gender: male
Posts: 398
Re: saliency to flicker
« Reply #6 on: 05/29/02 at 13:10:04 »
Quote Quote Modify Modify

on 05/29/02 at 12:41:41, Laurent Itti wrote:
- SingleChannel has a vector of pyramids with len determined at construction
- SingleChannel::input() has an additional parameter "timestamp" in ms
- in parallel with the vector of frames we keep a vector of time stamps
- SingleChannel has an additional method that computes a weight depending on inter-frame difference and a time constant that can also be set at construction
 
or we directly add a "float timestamp" to Image so that we don't have to deal with time stamps separately from the frames (separate param, separate vector, etc)?

Sounds good, except I'd suggest we don't add a timestamp directly to the Image class... this would just be excess baggage and wouldn't fit conceptually in most places where Image objects are used (e.g. temp arrays during computation, etc.) But instead of having to keep parallel vectors of frames and timestamps in SingleChannel, we can just keep a vector of some simple struct:
 
struct StampedImage {
  Image<float> img;
  float stamp;
};

 
If needed, StampedImage could be publically visible rather than an implementation detail of SingleChannel, so that StampedImage objects could be created with the proper timestamp from the get-go, e.g. in bin/vision. Or, if we have a TimeSimulator class, we could have a StampedImage contructor that takes an Image object and then fetches the timestamp by asking for the current simtime from the TimeSimulator.
 
Then we can have
 
std::vector<StampedImage> history;
 
Actually a std::deque or std::list would make more sense since we're going to be push_back()'ing and pop_front()'ing, and pop_front() is not efficient (or maybe not even implemented) with std::vector.
 
Or in fact aren't we talking about a vector of Pyramid objects rather than of Image objects?
 
Cheers,
Rob
Logged
Laurent Itti
YaBB Moderator
YaBB God
*****




iLab rocks!

   
WWW

Gender: male
Posts: 551
Re: saliency to flicker
« Reply #7 on: 05/29/02 at 13:26:56 »
Quote Quote Modify Modify

yep, it's a vector of pyramids. How about a derived StampedImage and StampedPyramid, rather than a struct?
 
yep, a deque makes sense since indeed what we want is a sliding queue (push from one side, pop from the other).
 
best,
 
  -- laurent
Logged
Rob Peters
YaBB Senior Member
****






  rjpcal  
WWW

Gender: male
Posts: 398
Re: saliency to flicker
« Reply #8 on: 05/29/02 at 13:54:39 »
Quote Quote Modify Modify

on 05/29/02 at 13:26:56, Laurent Itti wrote:
yep, it's a vector of pyramids. How about a derived StampedImage and StampedPyramid, rather than a struct?

I guess my reflex is to always avoid inheritance unless no alternative is available... with inheritance you have to duplicate the constructors of the base class if you want to keep the same constructors available, and you have to be careful about the destructors--if the base class destructor is not virtual, then the derived class destructor will not be called if the object is deleted through a pointer to the base class.
 
Also, with inheritance you can end up needing multiple parallel hierarchies if the class has to be configured along more than one orthogonal dimension (e.g. the GaborChannel/GaborFlickerChannel issue).
 
Granted, these are issues aren't likely to cause problems in this case, so either way is probably ok, it's just a matter of preference...
 
Cheers,
Rob
Logged
Dirk Bernhardt-Walther
YaBB Full Member
***





   
WWW

Posts: 155
Re: saliency to flicker
« Reply #9 on: 05/29/02 at 14:03:43 »
Quote Quote Modify Modify

Quote:
Sounds good, except I'd suggest we don't add a timestamp directly to the Image class... this would just be excess baggage and wouldn't fit conceptually in most places where Image objects are used (e.g. temp arrays during computation, etc.) But instead of having to keep parallel vectors of frames and timestamps in SingleChannel, we can just keep a vector of some simple struct:
 
struct StampedImage {
  Image<float> img;
  float stamp;
};

 
I've had some experience for a while now with a similar construct labelImage (see VisualCortex.H). It is rather cumbersome. What is the big deal with attaching just another float member to each imager and/or pyramide?  
 
Dirk
Logged
Rob Peters
YaBB Senior Member
****






  rjpcal  
WWW

Gender: male
Posts: 398
Re: saliency to flicker
« Reply #10 on: 05/29/02 at 14:49:03 »
Quote Quote Modify Modify

on 05/29/02 at 14:03:43, Dirk Walther wrote:
I've had some experience for a while now with a similar construct labelImage (see VisualCortex.H). It is rather cumbersome. What is the big deal with attaching just another float member to each imager and/or pyramide?  

 
  • My main issue is that attaching extra members like this doesn't keep things modular. If the new member will be irrelevant for most of the clients of Image, then it becomes a maintenance problem (more conceptual complexity, more recompiles, etc.).
  • For a core class like Image with lots of related functions, it's not an extensible approach; we'd end up with lots of new members being added eventually. This quickly becomes more cumbersome than the alternative.
  • Each function related to Image (e.g. the mathematical operators) would have to decide what to do with e.g. the timestamp or the label or both. This is not functionality that belongs in operator+().
  • Re: inheritance: we end up with problems with any function that process an Image and returns a new Image as a result; the result will not "know anything" about the additional members, so assigning the result to a variable of the derived type will give unexpected results. Basically the problem is that "derivedObject is-a baseObject", but "baseObject is-not-a derivedObject", and the return value from a function is a baseObject.

    Image<float> foo(const Image<float>& x);
     
    class StampedImage : public Image<float> { /* ... */ };
     
    StampedImage x;
    StampedImage y = foo(x);  

    Now foo() has essentially stripped away the timestamp information, and StampedImage's ctor will have to recreate it. Either this will not compile (if StampedImage does not define a ctor taking an Image<float>), or may give wrong results (e.g. if StampedImage's ctor fetches a new time stamp, then y has a different time stamp than x, which may or may not be what we want).
     
    The point is that these (important) issues should not have to be addressed in the Image class, but rather in the specific code that is making use of e.g. the time-stamps. With an aggregated struct, we'd have:

    struct StampedImage {
      // Just one ctor needed
      StampedImage(const Image<float>& i, float t) : img(i), float(t) {}
      Image<float> img; float time;
    }
     
    StampedImage x;
    StampedImage y( foo(x.img), x.time );
    // OR
    StampedImage z( foo(x.img), simtimer.now() );

Cheers,
Rob
Logged
Laurent Itti
YaBB Moderator
YaBB God
*****




iLab rocks!

   
WWW

Gender: male
Posts: 551
Re: saliency to flicker
« Reply #11 on: 05/29/02 at 22:13:19 »
Quote Quote Modify Modify

generally speaking, I tend to agree with Rob, but in this case we could argue that a timestamp is a core component of an image. Not too long ago I was at a military workshop, where they were making a big push towards enforcing that no digital image be taken without at least having a UTC timestamp and GPS coordinates associated with it. In fact, some DV camcorders now do that (time+GPS).
 
in any case, let's ruminate a bit on this and I'll get started on the implementation one of these days.
 
Dirk: when is your SURF student who is going to implement Adelson & Bergen starting?
 
best,
 
  -- laurent
Logged
Dirk Bernhardt-Walther
YaBB Full Member
***





   
WWW

Posts: 155
Re: saliency to flicker
« Reply #12 on: 05/30/02 at 10:08:14 »
Quote Quote Modify Modify

Quote:
Dirk: when is your SURF student who is going to implement Adelson & Bergen starting?

Soon. I just emailed him. Our finals are next week, so I guess he'll start by the end of next week. I'll first have him figure out how to handle video data - grab videos from the camera, save and load videos etc. Do you have any suggestions where we should start?
Also, with something like the vector of pyramides, we could go about and explore different ways of motion processing - Flicker, Adleson & Bergen, correlating two succesive frames, shifted by x pixels etc.
 
Who is going to implement the vector of pyramides thing? Chuck could do it as a first exercise with the code. Or do you want to do it?
 
Dirk
Logged
Laurent Itti
YaBB Moderator
YaBB God
*****




iLab rocks!

   
WWW

Gender: male
Posts: 551
Re: saliency to flicker
« Reply #13 on: 05/30/02 at 12:39:08 »
Quote Quote Modify Modify

regarding video, do you have V4L-driven hardware?
 
if so, have a look at V4Lgrabber.H/C and test-grab.C right here in src3/. If you have firewire, try IEEE1394grabber.H/C.
 
yeah, shifted differences seem like the way to go.  I'll see if I have time to do the vector of pyramids stuff. May be a bit abrupt for a starter, since the whole channel business has become a bit complicated over time. But looking at various shifted differences schemes would be a good starter I think.
 
best,
 
  -- laurent
Logged
Dirk Bernhardt-Walther
YaBB Full Member
***





   
WWW

Posts: 155
Re: saliency to flicker
« Reply #14 on: 05/30/02 at 13:13:30 »
Quote Quote Modify Modify

Quote:
regarding video, do you have V4L-driven hardware?
 if so, have a look at V4Lgrabber.H/C and test-grab.C right here in src3/. If you have firewire, try IEEE1394grabber.H/C.

Yep, my stuff is all V4L. But I wanted my student not to deal with the details of controlling the grabber hardware. I was more refering to file standards for videos - e.g. multiframe pgm. Are there any libraries that we could use for this?
 
Also, do you have suggestions for software that:  
- grabs a video from V4L to disc
- converts video format as needed
- displays a movie from multiframe pgm?
 
We don't have to re-invent the wheel if all this already exists.
 
Cheers
 
Dirk
Logged
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | No topic »

iLab Forum » Powered by YaBB !
YaBB 2000-2002,
Xnull. All Rights Reserved.