GistEstimatorBeyondBoF.H

Go to the documentation of this file.
00001 /**
00002    \file Neuro/GistEstimatorBeyondBoF.H
00003 
00004    \brief Implementation of ``Beyond Bags of Features: Spatial Pyramid
00005    Matching for Recognizing Natural Scene Categories'' by Lazebnik, et
00006    al.
00007 
00008    The GistEstimatorBeyondBoF class implements the following paper
00009    within the INVT framework:
00010 
00011    Lazebnik, S., Schmid, C., Ponce, J.
00012    Beyond Bags of Features: Spatial Pyramid Matching for Recognizing
00013       Natural Scene Catgories
00014    CVPR, 2006.
00015 
00016    In the paper, the authors describe the use of weak features (oriented
00017    edge points) and strong features (SIFT descriptors) as the basis for
00018    classifying images. In this implementation, however, we only concern
00019    ourselves with strong features clustered into 200 categories (i.e.,
00020    the vocabulary size is 200 ``words''). Furthermore, we restrict the
00021    spatial pyramid used as part of the matching process to 2 levels.
00022 
00023    We restrict ourselves to the above configuration because, as Lazebnik
00024    et al. report, it yields the best results (actually, a vocabulary
00025    size of 400 is better, but not by much).
00026 
00027    To compute the gist vector for an image, we first divide the image
00028    into 16x16 pixel patches and compute SIFT descriptors for each of
00029    these patches. We then assign these descriptors to bins corresponding
00030    to the nearest of the 200 SIFT descriptors (vocabulary) gleaned from
00031    the training phase. This grid of SIFT descriptor indices is then
00032    converted into a feature map that specifies the grid coordinates for
00033    each of the 200 feature types. This map allows us to compute the
00034    multi-level histograms as described in the paper. The gist vectors we
00035    are interested in are simply the concatenation of all the multi-level
00036    histograms into a flat array of numbers.
00037 
00038    Once we have these gist vectors, we can classify images using an SVM.
00039    The SVM kernel is the histogram intersection function, which takes the
00040    gist vectors for the input and training images and returns the sum of
00041    the minimums of each dimension (once again, see the paper for the gory
00042    details).
00043 
00044    This class, viz., GistEstimatorBeyondBoF, only computes gist vectors
00045    (i.e., normalized multi-level histograms) given the 200 ``word''
00046    vocabulary of SIFT descriptors to serve as the bins for the
00047    histograms. The actual training and classification must be performed
00048    by client programs. To assist with the training process, however, this
00049    class sports a training mode in which it does not require the SIFT
00050    descriptors database. Instead, in training mode, it simply returns the
00051    raw grid of SIFT descriptors for the images it is supplied. This
00052    allows clients to store those descriptors for clustering, etc.
00053 */
00054 
00055 // //////////////////////////////////////////////////////////////////// //
00056 // The iLab Neuromorphic Vision C++ Toolkit - Copyright (C) 2000-2005   //
00057 // by the University of Southern California (USC) and the iLab at USC.  //
00058 // See http://iLab.usc.edu for information about this project.          //
00059 // //////////////////////////////////////////////////////////////////// //
00060 // Major portions of the iLab Neuromorphic Vision Toolkit are protected //
00061 // under the U.S. patent ``Computation of Intrinsic Perceptual Saliency //
00062 // in Visual Environments, and Applications'' by Christof Koch and      //
00063 // Laurent Itti, California Institute of Technology, 2001 (patent       //
00064 // pending; application number 09/912,225 filed July 23, 2001; see      //
00065 // http://pair.uspto.gov/cgi-bin/final/home.pl for current status).     //
00066 // //////////////////////////////////////////////////////////////////// //
00067 // This file is part of the iLab Neuromorphic Vision C++ Toolkit.       //
00068 //                                                                      //
00069 // The iLab Neuromorphic Vision C++ Toolkit is free software; you can   //
00070 // redistribute it and/or modify it under the terms of the GNU General  //
00071 // Public License as published by the Free Software Foundation; either  //
00072 // version 2 of the License, or (at your option) any later version.     //
00073 //                                                                      //
00074 // The iLab Neuromorphic Vision C++ Toolkit is distributed in the hope  //
00075 // that it will be useful, but WITHOUT ANY WARRANTY; without even the   //
00076 // implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR      //
00077 // PURPOSE.  See the GNU General Public License for more details.       //
00078 //                                                                      //
00079 // You should have received a copy of the GNU General Public License    //
00080 // along with the iLab Neuromorphic Vision C++ Toolkit; if not, write   //
00081 // to the Free Software Foundation, Inc., 59 Temple Place, Suite 330,   //
00082 // Boston, MA 02111-1307 USA.                                           //
00083 // //////////////////////////////////////////////////////////////////// //
00084 //
00085 // Primary maintainer for this file: Manu Viswanathan <mviswana at usc dot edu>
00086 // $HeadURL: svn://isvn.usc.edu/software/invt/trunk/saliency/src/Neuro/GistEstimatorBeyondBoF.H $
00087 // $Id: GistEstimatorBeyondBoF.H 11138 2009-04-22 22:02:38Z mviswana $
00088 //
00089 
00090 #ifndef GE_BEYOND_BAGS_OF_FEATURES_H_DEFINED
00091 #define GE_BEYOND_BAGS_OF_FEATURES_H_DEFINED
00092 
00093 //------------------------------ HEADERS --------------------------------
00094 
00095 // Gist specific headers
00096 #include "Neuro/GistEstimator.H"
00097 
00098 // Other INVT headers
00099 #include "SIFT/Keypoint.H"
00100 #include "Neuro/NeuroSimEvents.H"
00101 
00102 // Standard C++ headers
00103 #include <ostream>
00104 #include <list>
00105 #include <string>
00106 
00107 //------------------------- CLASS DEFINITION ----------------------------
00108 
00109 /**
00110    \class GistEstimatorBeyondBoF
00111    \brief Gist estimator for ``Beyond Bags of Features ...'' by Lazebnik,
00112    et al.
00113 
00114    This class computes the gist vector for an input image using the
00115    feature extraction and spatial pyramid matching scheme described in
00116    sections 4 and 3 (respectively) of ``Beyond Bags of Features: Spatial
00117    Pyramid Matching for Recognizing Natural Scene Categories'' by
00118    Lazebnik, et al.
00119 
00120    While the authors of the above-mentioned paper experiment with weak
00121    features (oriented edge points) and strong features (SIFT descriptors)
00122    and with different resolutions of the spatial matching pyramid, this
00123    class only implements strong features clustered into 200 categories
00124    and a two-level pyramid because other configurations were either not
00125    as good as this one or did not offer any significant advantages over
00126    it (as reported in the paper).
00127 
00128    Thus, utilizing the terminology employed by Lazebnik, et al., we have
00129    the number of channels M = 200 and the number of levels of the spatial
00130    matching pyramid L = 2. This will result in gist vectors of
00131    dimensionality:
00132 
00133       M * (4^(L+1) - 1)/3 = M * (2^(2L+2) - 1)/3
00134                           = 200 * (2^(2*2+2) - 1)/3
00135                           = 200 * (2^6 - 1)/3
00136                           = 200 * 63/3
00137                           = 200 * 21
00138                           = 4200
00139 
00140    See the paper for the gory details.
00141 */
00142 class GistEstimatorBeyondBoF : public GistEstimatorAdapter {
00143    // This ought to work fine, but for some weird reason, doesn't. The
00144    // 32-bit nightly builds are carried out by GCC version 3.4.1 and it
00145    // compiles this and the .C files fine. But the linker chokes and
00146    // complains about undefined references to these variables. This does
00147    // not seem to be a problem with newer versions of GCC (>= 4.2).
00148    //
00149    // Oddly enough, other classes (e.g., GistEstimatorContextBased and
00150    // GistEstimatorTexton) that also define static const members do not
00151    // suffer from this affliction. But those classes define static const
00152    // uints rather than static const ints. Still, this doesn't make
00153    // sense!
00154    //
00155    // Anyhoo, just to resolve this issue: we explicitly define these
00156    // static data members in the .C file.
00157    //static const int NUM_CHANNELS = 200 ;
00158    //static const int NUM_LEVELS   = 2 ;
00159    static int NUM_CHANNELS ;
00160    static int NUM_LEVELS   ;
00161    static int GIST_VECTOR_SIZE ;
00162 
00163 public:
00164    /// Accessors for some parameters used internally by the Lazebnik
00165    /// algorithm.
00166    ///@{
00167    static int num_channels()     {return NUM_CHANNELS ;}
00168    static int num_levels()       {return NUM_LEVELS ;}
00169    static int gist_vector_size() {return GIST_VECTOR_SIZE ;}
00170    ///@}
00171 
00172    /// Modifiers for some parameters used internally by the Lazebnik
00173    /// algorithm.
00174    ///
00175    /// WARNING: These methods should be used with care. Do not call them
00176    /// to change the size of the SIFT vocabulary and spatial pyramid size
00177    /// in the "middle" of an "entire" run. That is, if you use 100
00178    /// channels and a 4-level pyramid for training, don't switch to 200
00179    /// channels and a 3-level pyramid during the testing phase!
00180    ///@{
00181    static void num_channels(int) ;
00182    static void num_levels(int) ;
00183    ///@}
00184 
00185    /// The constructor expects to be passed an option manager, which it
00186    /// uses to set itself up within the INVT simulation framework.
00187    GistEstimatorBeyondBoF(OptionManager& mgr,
00188       const std::string& descrName = "GistEstimatorBeyondBoF",
00189       const std::string& tagName   = "GistEstimatorBeyondBoF") ;
00190 
00191    /// A SIFT descriptor is just a vector of 128 numbers. The following
00192    /// inner class provides a convenient wrapper for this vector.
00193    struct SiftDescriptor {
00194       static const int SIZE = 128 ;
00195 
00196       SiftDescriptor() ;
00197       SiftDescriptor(const Image<float>&) ;
00198       SiftDescriptor(const rutz::shared_ptr<Keypoint>&) ;
00199 
00200       // WARNING: NO RANGE CHECK! i better be in [0, 128).
00201       float& operator[](int i) {return values[i] ;}
00202       const float& operator[](int i) const {return values[i] ;}
00203    private :
00204       float values[SIZE] ;
00205    } ;
00206 
00207    /// Like other gist estimators, this one too filters the input image.
00208    /// Its filteration process involves subdividing the input image into
00209    /// 16x16 pixel patches and running SIFT on each of these patches. The
00210    /// filteration results are, therefore, a grid of SIFT descriptors.
00211    /// The following type is used to represent these results.
00212    typedef Image<SiftDescriptor> SiftGrid ;
00213 
00214    /// In order to compute a gist vector, this estimator needs to know
00215    /// the vocabulary associated with the bag of features. This
00216    /// vocabulary is usually obtained by a clustering process as part of
00217    /// the training. Each ``word'' or ``vis-term'' in this vocabulary is
00218    /// also known as a channel. The gist vector essentially represents a
00219    /// ``spatial histogram'' for each of these channels.
00220    ///
00221    /// The vocabulary itself is merely a collection of 200 SIFT
00222    /// descriptors (the centroids of the clusters) passed in via an
00223    /// Image of floating point numbers. Thus, the size of this Image
00224    /// would be 128x200. (The 128 comes from the dimensionality of a
00225    /// SIFT descriptor.)
00226    typedef std::list<SiftDescriptor> Vocabulary ;
00227 
00228    /// This method should be called once during the client's
00229    /// initialization process prior to attempting to obtain gist
00230    /// vectors for input images. Thus, the clustering phase of the
00231    /// training must be complete before this estimator can be used to
00232    /// compute gist vectors.
00233    void setVocabulary(const Image<float>&) ;
00234 
00235    /// To assist with training, GistEstimatorBeyondBoF can be configured
00236    /// to operate in a special training mode in which it does not have a
00237    /// vocabulary from which to form gist vectors but rather simply
00238    /// passes back (to its client) the grid of SIFT descriptors for the
00239    /// input image, i.e., the results of the filteration step. The
00240    /// client may then store these descriptors, perform the clustering
00241    /// required to create the vocabulary necessary for subsequent normal
00242    /// use of this gist estimator, and then run the estimator in
00243    /// non-training mode to compute the actual gist vectors.
00244    ///
00245    /// Training mode is set by specifying a hook function that accepts
00246    /// the filteration results, i.e., the grid/Image of SIFT
00247    /// descriptors.
00248    typedef void (*TrainingHook)(const SiftGrid&) ;
00249 
00250    /// This method should be called once during the client's
00251    /// initialization sequence to specify the training mode hook
00252    /// function to configure GistEstimatorBeyondBoF to run in training
00253    /// mode. If this hook is not specified, the estimator will run in
00254    /// ``normal'' mode and compute gist vectors from the vocabulary.
00255    ///
00256    /// It is an error to not specify either the training hook or the
00257    /// vocabulary. If both are specified, the training hook takes
00258    /// precedence, i.e., the estimator runs in training mode, wherein it
00259    /// passes back filteration results (grid of SIFT descriptors) to the
00260    /// client rather than computing gist vectors.
00261    void setTrainingHook(TrainingHook) ;
00262 
00263    /// Return the gist vector (useless in training mode).
00264    Image<double> getGist() ;
00265 
00266    /// Destructor
00267    virtual ~GistEstimatorBeyondBoF() ;
00268 
00269 protected:
00270   /// Callback for when a new input (retina) frame is available
00271   SIMCALLBACK_DECLARE(GistEstimatorBeyondBoF, SimEventRetinaImage);
00272 
00273 private :
00274    Image<double> itsGistVector ; // gist feature vector
00275    Vocabulary    itsVocabulary ;
00276    TrainingHook  itsTrainingHook ;
00277 } ;
00278 
00279 //---------------------- MISCELLANEOUS FUNCTIONS ------------------------
00280 
00281 std::ostream& operator<<(std::ostream&,
00282                          const GistEstimatorBeyondBoF::SiftDescriptor&) ;
00283 
00284 //-------------------- INLINE FUNCTION DEFINITIONS ----------------------
00285 
00286 inline Image<double> GistEstimatorBeyondBoF::getGist()
00287 {
00288    return itsGistVector ;
00289 }
00290 
00291 inline void
00292 GistEstimatorBeyondBoF::
00293 setTrainingHook(GistEstimatorBeyondBoF::TrainingHook H)
00294 {
00295    itsTrainingHook = H ;
00296 }
00297 
00298 //-----------------------------------------------------------------------
00299 
00300 #endif
00301 
00302 /* So things look consistent in everyone's emacs... */
00303 /* Local Variables: */
00304 /* indent-tabs-mode: nil */
00305 /* End: */