Dirichlet Kullback-Leibler

From ILabWiki

Table of contents

Derivation of the Kullback-Leibler (KL) distance for the Dirichlet Probability Distribution

Work In Progress

This is a work in progress. Please feel free to add to it or make comments.

The Dirichlet Probability Distribution (PDF)

The Direchlet PDF (http://en.wikipedia.org/wiki/Dirichlet_distribution) is a generalization of the Binomial PDF (http://en.wikipedia.org/wiki/Binomial_distribution) in that it is computed over more than two possible outcomes. It is also a special multinomial case of the Gamma PDF where the Rate β parameter is fixed as 1. It is given by the formula:

f(x_1, \dots, x_K; \alpha_1, \dots, \alpha_K) = \frac{1}{\mathrm{B}(\alpha)} \prod_{i=1}^K x_i^{\alpha_i - 1}

Where the denominator B(α) is a normalizing multinomial beta constant. Given as:

\mathrm{B}(\alpha) = \frac{\prod_{i=1}^K \Gamma(\alpha_i)}{\Gamma\left(\sum_{i=1}^K \alpha_i\right)}

The Kullback-Leibler Divergence/Distance (KL)

The KL distance (http://en.wikipedia.org/wiki/Kullback-Leibler_divergence) is a measure of information distance between two PDF's. Notably we wish to know how much information in terms of information defined by Shannon, two PDF's have in common. This is given by a basic formula:

L =  - \int {p\left( {\mathbf{x}} \right)\ln \frac{{\tilde p\left( {\mathbf{x}} \right)}} {{p\left( {\mathbf{x}} \right)}}} d{\mathbf{x}}

The KL Distance for a Dirichlet PDF

To find the KL distance we will need to solve the integral given over the Dirichlet PDF. First we will introduce a simple notation designed as a short hand to make the math a little cleaner.

\left(1\right) \qquad \phi = \frac{\prod_{i=1}^K \Gamma(\alpha_i)}{\Gamma\left(\sum_{i=1}^K \alpha_i\right)}

and

\left(2\right) \qquad \theta = \prod_{i=1}^K x_i^{\alpha_i - 1}

Since we are integrating over x The first step is to move pure α terms out. So we start with:

\left(3\right) \qquad L =  - \frac{1} {\phi } \cdot \int \theta \cdot \ln \frac{{\tilde p\left( {\mathbf{x}} \right)}} {{p\left( {\mathbf{x}} \right)}}

Next we make things easier by utilizing the logarithmic identities and expanding out the log

\left(4\right) \qquad L =  - \frac{1} {\phi } \cdot \int \theta   \cdot \left( {\ln \tilde p\left( {\mathbf{x}} \right) - \ln p\left( {\mathbf{x}} \right)} \right)

which becomes:

\left(5\right) \qquad L =  - \frac{1} {\phi } \cdot \int \theta   \cdot \ln \tilde p\left( {\mathbf{x}} \right) - \theta  \cdot \ln p\left( {\mathbf{x}} \right)

Next we will expand a little further. The logarithm allows us to expand quite a bit into a large number of sumations:

\left(6\right) \qquad \theta  \cdot \ln p\left( {\mathbf{x}} \right) = \theta  \cdot \ln \frac{1} {\phi } + \theta  \cdot \ln \theta

As we expand we see that

\left(7\right) \qquad \theta  \cdot \ln p\left( {\mathbf{x}} \right) = \theta  \cdot \ln \left( 1 \right) - \theta  \cdot \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\alpha _i } \right)} } \right) + \theta  \cdot \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\alpha _i } } \right)} \right) + \theta  \cdot \ln \left( {\prod\limits_{i = 1}^K {x_i^{\alpha _i  - 1} } } \right)

Integration of the Parts

Which shows us that we have really only to solve four integrals and then we are done. The first three are trivial since we can compute the integral over the product as a multiple integral for each xi and the other part contains only α and as such is a constant.

\left(1\right) \qquad I_1  = \int {\theta  \cdot \ln \left( 1 \right)dx } = 0
\left(2\right) \qquad I_2  = \int {\theta  \cdot \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\alpha _i } \right)} } \right)dx = } \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\alpha _i } \right)} } \right) \cdot \int {\theta dx}  = \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\alpha _i } \right)} } \right) \cdot {\prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i} } {\alpha _i} } }
\left(3\right) \qquad \tilde I_2 = \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\tilde \alpha _i } \right)} } \right) \cdot  {\prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i} } {\alpha _i} } }
\left(4\right) \qquad I_3  = \int {\theta  \cdot \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\alpha _i } } \right)} \right)dx = } \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\alpha _i } } \right)} \right) \cdot \int {\theta dx}  = \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\alpha _i } } \right)} \right) \cdot {\prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i} } {\alpha _i } } }
\left(5\right) \qquad \tilde I_3 = \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\tilde \alpha _i } } \right)} \right) \cdot {\prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i} } {\alpha _i } } }

The forth integral is more tricky:

\left(6\right) \qquad I_4  = \int {\theta  \cdot \ln \left( {\prod\limits_{i = 1}^K {x_i^{\alpha _i  - 1} } } \right)dx}

Integration of the Fourth Term

We exploit the logarithm of the product to break this integral up even further:

\left(1\right) \qquad I_4  = \int {\theta  \cdot \ln \left( {\prod\limits_{i = 1}^K {x_i^{\alpha _i  - 1} } } \right)dx}  = \int {\theta  \cdot \ln \left( {x_1^{\alpha _1  - 1} } \right)}  + \int {\theta  \cdot \ln \left( {x_2^{\alpha _2  - 1} } \right)}  +  \ldots  + \int {\theta  \cdot \ln \left( {x_k^{\alpha _k  - 1} } \right)}

This means we have broken this down even further and are left with the simple integral:

\left(2\right) \qquad I_{4_i }  = \int {\theta  \cdot \ln \left( {x_i^{\alpha _i  - 1} } \right)dx}

This part again breaks into smaller pieces since xi is a dimension along all x components. This yields the integral:

\left(3\right) \qquad I_{4_i }  = \int {\theta  \cdot \ln } \left( {x_i^{\alpha _i  - 1} } \right)dx = \int { \ldots \int { \ldots \int {x_1^{\alpha _1  - 1}  \cdot  \ldots  \cdot x_i^{\alpha _i  - 1} \ln \left( {x_i^{\alpha _i  - 1} } \right) \cdot  \ldots  \cdot x_k^{\alpha _k  - 1} dx_1  \ldots dx_i  \ldots dx_k } } }

Since we have independent dimensions we can solve this as:

\left(4\right) \qquad I_{4_i }  = \frac{{x_1^{\alpha _1 } }} {\alpha _1} \cdot  \ldots  \cdot \frac{{x_i^{\alpha _i }  \cdot \left( {\alpha _i  \cdot \ln \left( {x_i^{\alpha _i -1} } \right) - \alpha _i + 1} \right)}} {{\alpha _i^2 }} \cdot  \ldots  \cdot \frac{{x_k^{\alpha _k } }} {\alpha _k}

Which gives us:

\left(5\right) \qquad I_{4_i }  = \frac{{\alpha _i  \cdot \ln \left( {x_i^{\alpha _i -1} } \right) - \alpha _i + 1}} {{\alpha _i }} \cdot \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }

Then we sum over all the I_{4_i } to get:

\left(6\right) \qquad I_4  = \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  \cdot \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\alpha _i -1} } \right) - \alpha _i + 1}} {{\alpha _i   }}}
\left(7\right) \qquad \tilde I_4  = \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i} }   \cdot \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\tilde \alpha _i -1} } \right) - \tilde \alpha _i + 1}} {{\alpha _i   }}}

Bringing it Back Together

Plugging the individual integral solutions back in we get the general form of the solution:

\left(1\right) \qquad L =  - \frac{1} {\phi } \cdot \left( { - \tilde I_2  + \tilde I_3  + \tilde I_{4_1 }  +  \ldots  + \tilde I_{4_k } + I_2  - I_3  - I_{4_1 }  -  \ldots  - I_{4_k } } \right)

Which we can expand into a somewhat more visible, but a little more messy form as:

\left(2\right) \qquad L = \frac{1} {\phi } \cdot ( - \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\tilde \alpha _i } \right)} } \right) \cdot \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } } + \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\tilde \alpha _i } } \right)} \right) \cdot \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  +
\qquad \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  \cdot \sum\limits_{i = 1}^K {\frac{{ \alpha _i  \cdot \ln \left( {x_i^{\tilde \alpha _i -1} } \right) - \tilde \alpha _i + 1}} {{\alpha _i }}}  +
\qquad \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\alpha _i } \right)} } \right) \cdot \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  - \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\alpha _i } } \right)} \right) \cdot \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  -
\qquad \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  \cdot \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\alpha _i -1} } \right) - \alpha _i + 1}} {{\alpha _i }}} )

Notice that we can bring several of the larger chunks back together as φ's. We can then simplify quite a bit more to:

\left(3\right) \qquad L =  - \frac{1} {\phi } \cdot \left( {\prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  \cdot \left( {\ln \left( \frac{1} { {\tilde \phi } }\right) + \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\tilde \alpha _i -1} } \right) - \tilde \alpha _i + 1}} {{\alpha _i }}} } \right) - \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  \cdot \left( { \ln \left( \frac{1} { \phi } \right) + \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\alpha _i -1} } \right) - \alpha _i + 1}} {{\alpha _i }}} } \right)} \right)

Indefinite Solution

To obtain the final indefinite solution we then simplify a little bit more while replacing φ symbols with the beta Β references in the original Dirichlet equation:

\left(1\right) \qquad L =  - \frac{1} {{\Beta \left(\alpha \right) }} \cdot \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }  \cdot \left( { \left( { \ln \left(  \frac{1} { {\Beta \left(\tilde \alpha \right) } }\right) + \sum\limits_{i = 1}^K {\frac{{ \alpha _i  \cdot \ln \left( {x_i^{\tilde \alpha _i -1} } \right) - \tilde \alpha _i + 1}} {{\alpha _i }}} } \right) -  \left(  {\ln \left( \frac{1} {\Beta \left(\alpha \right)}  \right) + \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\alpha _i -1} } \right) - \alpha _i + 1}} {{\alpha _i }}} } \right)} \right)

Additionally, one might try and solve this by moving the potentially very large numbers together. For instance, try:

\left(2\right) \qquad L_0 =  \frac{1} {{\Beta \left(\alpha \right) }} \cdot \prod\limits_{i = 1}^K {\frac {x_i^{\alpha _i } } {\alpha _i } }
\left(3\right) \qquad L_\alpha = \left( { \ln \left( {\frac{1} {{{\rm B}\left( \alpha  \right)}}} \right) + \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\alpha _i -1} } \right) - \alpha _i + 1}} {{\alpha _i }}} } \right)
\left(4\right) \qquad L_{\tilde \alpha }  = \left( {\ln \left( {\frac{1} {{{\rm B}\left( {\tilde \alpha } \right)}}} \right) + \sum\limits_{i = 1}^K {\frac{{ \alpha _i  \cdot \ln \left( {x_i^{\tilde \alpha _i -1} } \right) - \tilde \alpha _i + 1}} {{ \alpha _i }}} } \right)

Where we would then find that:

\left(5\right) \qquad  L = - L_0 \left( L_{\tilde \alpha} - L_\alpha \right)

The integral must now be solved within a bounded definite region since:

\left(6\right) \qquad x_1 + x_2 + \ldots + x_k = 1

In general this is a simplex bounded region. So if we are given x1,x2 and x3 the region where the integral is bounded by would be a triangle touching the corners of a 3d box.

Alternative Simplification

We can create an alternative simplification by bringing certain components together. We lose some of the visibility of the relationships, but gain some computational simplicity. First note that we can bring the Β components together as:

\left(1\right) \qquad \ln \left( {\frac{1} {{{\rm B}\left( {\tilde \alpha } \right)}}} \right) - \ln \left( {\frac{1} {{{\rm B}\left( \alpha  \right)}}} \right) = \frac{{\frac{1} {{{\rm B}\left( {\tilde \alpha } \right)}}}} {{\frac{1} {{{\rm B}\left( \alpha  \right)}}}} = \frac{{{\rm B}\left( \alpha  \right)}} {{{\rm B}\left( {\tilde \alpha } \right)}}

Next we bring the summation operations together:

\left(2\right) \qquad \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \ln \left( {x_i^{\tilde \alpha  - 1} } \right) - \tilde \alpha _i  + 1 - \alpha _i  \cdot \ln \left( {x_i^{\alpha  - 1} } \right) + \alpha _i  - 1}} {{\alpha _i }}}
\left(3\right) \qquad \sum\limits_{i = 1}^K {\frac{{\alpha _i  \cdot \left( {\ln \left( {x_i^{\tilde \alpha  - 1} } \right) - \ln \left( {x_i^{\alpha  - 1} } \right) + 1} \right) - \tilde \alpha _i }} {{\alpha _i }}}
\left(4\right) \qquad \sum\limits_{i = 1}^K {\left( {\ln \left( {x_i^{\tilde \alpha  - 1} } \right) - \ln \left( {x_i^{\alpha  - 1} } \right) + 1} \right) - \frac{{\tilde \alpha _i }} {{\alpha _i }}}
\left(5\right) \qquad K \cdot \sum\limits_{i = 1}^K {\ln \left( {\frac{{x_i^{\tilde \alpha  - 1} }} {{x_i^{\alpha  - 1} }}} \right) - \frac{{\tilde \alpha _i }} {{\alpha _i }}}

Which gives us a new simplified version:

\left(6\right) \qquad L =  - \frac{1} {{{\rm B}\left( \alpha  \right)}} \cdot \prod\limits_{i = 1}^K {\frac{{x_i^{\alpha _i } }} {{\alpha _i }} \cdot } \left( {\ln \left( {\frac{{{\rm B}\left( \alpha  \right)}} {{{\rm B}\left( {\tilde \alpha } \right)}}} \right) + K \cdot \sum\limits_{i = 1}^K {\ln \left( {\frac{{x_i^{\tilde \alpha _i  - 1} }} {{x_i^{\alpha _i  - 1} }}} \right) - \frac{{\tilde \alpha _i }} {{\alpha _i }}} } \right)

Definite Integral Solution

The indefinite solution will not capture the true Dirichlet PDF since the real density lies on a slice where:

\left(1\right) \qquad  x_1  + x_2  +  \ldots  + x_k  = 1

First we note that \Beta \left(\alpha \right) is the normalizing constant for θ. As such we know:

\left(2\right) \qquad {\rm B}\left( \alpha  \right) = \int {\prod\limits_{i = 1}^K {x_i^{\alpha _i  - 1} } }

This simplifies I2 and I3 greatly to:

\left(3\right) \qquad I_2  = \ln \left( {\prod\limits_{i = 1}^K {\Gamma \left( {\alpha _i } \right)} } \right) \cdot {\rm B}\left( \alpha  \right)

and

\left(4\right) \qquad I_3  = \ln \left( {\Gamma \left( {\sum\limits_{i = 1}^K {\alpha _i } } \right)} \right) \cdot {\rm B}\left( \alpha  \right)

Next we recall I_{4_i } which we stated was:

\left(5\right) \qquad I_{4_i }  = \int {\theta  \cdot \ln \left( {x_i^{\alpha _i  - 1} } \right)} dx_i

In general, this is difficult since it has the general form of a hypergeometic function. As such, methods for definite integration such as iteration fail quickly. The two dimensional solution provided by Baldi and Itti from a general solution from Gradshteyn and Ryzhik's table of integrals is:

\left(6\right) \qquad \int {\theta  \cdot \ln \left( {x^{\alpha _1  - 1} } \right)} dx = \left( {\alpha _1  - 1} \right) \cdot \frac{\partial } {{\partial \alpha _1 }}{\rm B}\left( {\alpha _1 ,\alpha _2 } \right)

Note that according to the Wikipedia (http://en.wikipedia.org/wiki/Beta_function)

\left(7\right) \qquad \frac{\partial } {{\partial x}}{\rm B}\left( {x,y} \right) = {\rm B}\left( {x,y} \right) \cdot \left( {\ln \Gamma '\left( x \right) - \ln \Gamma '\left( {x + y} \right)} \right) = {\rm B}\left( {x,y} \right) \cdot \left( {\psi \left( x \right) - \psi \left( {x + y} \right)} \right)

where ψ(x) is the digamma function.

I suspect that as we add more than variables, this relationship should hold since the logarithm component counteracts the beta component proportional to each variables contribution to the integral.

When we combine all the Integral components back together, the \Beta \left(\alpha \right) components should cancel out with the one sitting at the front of the integral in much the same way as above.

\left(8\right) \qquad L = \ln \left( {\frac{{{\rm B}\left( \alpha  \right)}} {{{\rm B}\left( {\tilde \alpha } \right)}}} \right) + \left( {\alpha _1  - \tilde \alpha _1 } \right)\left( {\psi \left( {\alpha _1 } \right) - \psi \left( {\alpha _1  +  \ldots  + \alpha _k } \right)} \right) +  \ldots  + \left( {\alpha _k  - \tilde \alpha _k } \right)\left( {\psi \left( {\alpha _k } \right) - \psi \left( {\alpha _1  +  \ldots  + \alpha _k } \right)} \right)

Comments? Suggestions? Corrections?

Comments can be added in the discussion section of this page. Also, questions can be directed to either the iLab discussion board (http://ilab.usc.edu/cgi-bin/yabb/YaBB.pl) which is monitored by the authors or via email at: mailto:mundhenk@usc.edu


Copyright © 2009 by the University of Southern California, iLab and T. Nathan Mundhenk (http://www.mundhenk.com). All Rights Reserved.