Dirichlet Kullback-Leibler
From ILabWiki
| Table of contents |
Derivation of the Kullback-Leibler (KL) distance for the Dirichlet Probability Distribution
Work In Progress
This is a work in progress. Please feel free to add to it or make comments.
The Dirichlet Probability Distribution (PDF)
The Direchlet PDF (http://en.wikipedia.org/wiki/Dirichlet_distribution) is a generalization of the Binomial PDF (http://en.wikipedia.org/wiki/Binomial_distribution) in that it is computed over more than two possible outcomes. It is also a special multinomial case of the Gamma PDF where the Rate β parameter is fixed as 1. It is given by the formula:
Where the denominator B(α) is a normalizing multinomial beta constant. Given as:
The Kullback-Leibler Divergence/Distance (KL)
The KL distance (http://en.wikipedia.org/wiki/Kullback-Leibler_divergence) is a measure of information distance between two PDF's. Notably we wish to know how much information in terms of information defined by Shannon, two PDF's have in common. This is given by a basic formula:
The KL Distance for a Dirichlet PDF
To find the KL distance we will need to solve the integral given over the Dirichlet PDF. First we will introduce a simple notation designed as a short hand to make the math a little cleaner.
and
Since we are integrating over x The first step is to move pure α terms out. So we start with:
Next we make things easier by utilizing the logarithmic identities and expanding out the log
which becomes:
Next we will expand a little further. The logarithm allows us to expand quite a bit into a large number of sumations:
As we expand we see that
Integration of the Parts
Which shows us that we have really only to solve four integrals and then we are done. The first three are trivial since we can compute the integral over the product as a multiple integral for each xi and the other part contains only α and as such is a constant.
The forth integral is more tricky:
Integration of the Fourth Term
We exploit the logarithm of the product to break this integral up even further:
This means we have broken this down even further and are left with the simple integral:
This part again breaks into smaller pieces since xi is a dimension along all x components. This yields the integral:
Since we have independent dimensions we can solve this as:
Which gives us:
Then we sum over all the
to get:
Bringing it Back Together
Plugging the individual integral solutions back in we get the general form of the solution:
Which we can expand into a somewhat more visible, but a little more messy form as:
Notice that we can bring several of the larger chunks back together as φ's. We can then simplify quite a bit more to:
Indefinite Solution
To obtain the final indefinite solution we then simplify a little bit more while replacing φ symbols with the beta Β references in the original Dirichlet equation:
Additionally, one might try and solve this by moving the potentially very large numbers together. For instance, try:
Where we would then find that:
The integral must now be solved within a bounded definite region since:
In general this is a simplex bounded region. So if we are given x1,x2 and x3 the region where the integral is bounded by would be a triangle touching the corners of a 3d box.
Alternative Simplification
We can create an alternative simplification by bringing certain components together. We lose some of the visibility of the relationships, but gain some computational simplicity. First note that we can bring the Β components together as:
Next we bring the summation operations together:
Which gives us a new simplified version:
Definite Integral Solution
The indefinite solution will not capture the true Dirichlet PDF since the real density lies on a slice where:
First we note that
is the normalizing constant for θ. As such we know:
This simplifies I2 and I3 greatly to:
and
Next we recall
which we stated was:
In general, this is difficult since it has the general form of a hypergeometic function. As such, methods for definite integration such as iteration fail quickly. The two dimensional solution provided by Baldi and Itti from a general solution from Gradshteyn and Ryzhik's table of integrals is:
Note that according to the Wikipedia (http://en.wikipedia.org/wiki/Beta_function)
where ψ(x) is the digamma function.
I suspect that as we add more than variables, this relationship should hold since the logarithm component counteracts the beta component proportional to each variables contribution to the integral.
When we combine all the Integral components back together, the
components should cancel out with the one sitting at the front of the integral in much the same way as above.
Comments? Suggestions? Corrections?
Comments can be added in the discussion section of this page. Also, questions can be directed to either the iLab discussion board (http://ilab.usc.edu/cgi-bin/yabb/YaBB.pl) which is monitored by the authors or via email at: mailto:mundhenk@usc.edu
