<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-16672217</id><updated>2012-02-16T20:36:55.036-06:00</updated><title type='text'>Daniel E. Acuna's blog</title><subtitle type='html'>I'm a PhD candidate in Computer Science and Engineering from the University of Minnesota. I work primarily in Psychology and Cognitive Science to understand human decision-making.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.principiapredictiva.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16672217/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://blog.principiapredictiva.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Daniel Acuna</name><uri>http://www.blogger.com/profile/03629337800175796305</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/_LZjPx_Q3_ss/SMftCC6ye4I/AAAAAAAAAAk/4Su4YrxB6uw/S220/daniel_library.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-16672217.post-408791728022376445</id><published>2011-06-10T13:00:00.013-05:00</published><updated>2011-06-10T21:18:58.655-05:00</updated><title type='text'>Good reasons for choosing nonparametric priors</title><content type='html'>Applications of nonparametric Bayesian (NPB) statistics have exploded over the past ten years. And that's a good sign because it shows that it is useful and makes&amp;nbsp;sense. Unfortunately, just by mentioning the word&amp;nbsp;NPB&amp;nbsp;statistics, some people stop listening. A main cause is that it seems &lt;i&gt;way&lt;/i&gt;&amp;nbsp;too complicated.&lt;br /&gt;&lt;br /&gt;I don't agree.&lt;br /&gt;&lt;br /&gt;In this post, I will show how useful the approach may be in one of the simplest problems: mixture models. I also provide &lt;a href="https://github.com/daniel-acuna/pymc-submodels"&gt;some code&lt;/a&gt; that you can access in Github.&lt;br /&gt;&lt;br /&gt;Let's say that we want to model a bunch of observations as a finite mixture of Gaussians distributions.&amp;nbsp;As we all known, one of the problems with this approach is that we have to pick how many components our mixture will have. This is an arbitrary choice and, as experience has shown me, the majority of my arbitrary choices is wrong. How can we let the data decide the number of components?&lt;br /&gt;&lt;br /&gt;Here is when NPB&amp;nbsp;statistics comes to the rescue. There is one very useful nonparametric object called the Dirichlet process (DP) that contains an infinite number of degrees of freedom and essentially allows the data to decide how many of them will be realized. I will not give a detailed introduction to the DP but you can read&amp;nbsp;&lt;a href="http://www.cs.brown.edu/~sudderth/papers/sudderthPhD.pdf"&gt;Erik Sudderth's thesis&lt;/a&gt; for an excellent one.&lt;br /&gt;&lt;br /&gt;Well, the problem with the DP is that we have to approximate this infinite-dimensional&amp;nbsp;thing by some means. One of the easiest approximations is based on the so-called Sethuraman's stick-breaking construction of the DP, which can be truncated, incurring in &lt;a href="http://www.jstor.org/stable/3315951"&gt;well-known approximation errors&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One of the confusions that, at the end, you also get a distribution over the number of components used, anyway. This is, whether we pick the number of components or we use some truncated DP, there will be a random variable $z_\text{unique}$ that will say how many components our $N$ data points are using. But, with the DP, the distribution on $z_\text{unique}$ does &lt;i&gt;not&lt;/i&gt;&amp;nbsp;depend on the truncation—except for the obvious cases.&lt;br /&gt;&lt;br /&gt;For non-believers, I coded a little example.&lt;br /&gt;&lt;br /&gt;Let say we have the following observations $x = \{10, 11, 12, -10, -11, -12 \}$. I hope these data make clear that there should be 2 mixture components, one with mean 11 and the other with mean -11.&lt;br /&gt;&lt;br /&gt;I am using a nice Python package called &lt;a href="http://code.google.com/p/pymc/"&gt;PyMC&lt;/a&gt;&amp;nbsp;to implement my models (actually, &lt;a href="https://github.com/daniel-acuna/pymc-submodels"&gt;I am building a package myself&lt;/a&gt; that will contain several common Bayesian sub-models, such as mixtures and Markov chains, within the PyMC framework). You know, Python is a &lt;a href="http://scikit-learn.sourceforge.net/"&gt;good&lt;/a&gt; &lt;a href="http://numpy.scipy.org/"&gt;programming&lt;/a&gt; &lt;a href="http://pydev.org/"&gt;environment&lt;/a&gt; with a&amp;nbsp;&lt;a href="http://www.sagemath.org/index.html"&gt;healthy&lt;/a&gt; &lt;a href="http://stackoverflow.com/questions/tagged/python"&gt;community&lt;/a&gt;, and it's free (although I don't know&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Gratis_versus_libre"&gt;in what sense&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;Let's say we compute the posterior of $z_\text{unique}$ in the finite mixture with different number of components ($K$) and the infinite mixture with different truncation levels ($K_\text{trunc}$). I will test $p_{\text{finite},&amp;nbsp;K = 5}(z_\text{unique} \mid x)$ and&amp;nbsp;$p_{\text{finite},&amp;nbsp;K = 30}(z_\text{unique} \mid x)$ for the finite mixture. And&amp;nbsp;$p_{\text{infinite},&amp;nbsp;K_\text{trunc} = 5}(z_\text{unique} \mid x)$ and&amp;nbsp;$p_{\text{infinite},&amp;nbsp;K_\text{trunc} = 30}(z_\text{unique} \mid x)$.&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-S5IWO7so4ko/TfKWvhnj4zI/AAAAAAAAABo/UIEYL_VqDY0/s1600/dist_components_fixedmixture.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="300" src="http://2.bp.blogspot.com/-S5IWO7so4ko/TfKWvhnj4zI/AAAAAAAAABo/UIEYL_VqDY0/s400/dist_components_fixedmixture.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Posterior on number of components used for finite mixture model with 10 points&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-_I1t7kVLjuo/TfKWvZYU0jI/AAAAAAAAABk/UlGvRYy5qJk/s1600/dist_components_inftymixture.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="300" src="http://1.bp.blogspot.com/-_I1t7kVLjuo/TfKWvZYU0jI/AAAAAAAAABk/UlGvRYy5qJk/s400/dist_components_inftymixture.png" width="400" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Posterior on number of components used for infinite mixture model with 10 points&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;The results speak for themselves. For the finite mixture, the posterior on the number of components used depends on the number of components of the model. This is not the case for the infinite mixture model. In both truncation levels, the posterior is about the same.&lt;br /&gt;&lt;br /&gt;But why?&lt;br /&gt;&lt;br /&gt;The explanation is based on the interpretation of Bayesian statistics as Occam's Razor. The fixed number of components on the finite mixture model does not have a Bayesian&amp;nbsp;interpretation&amp;nbsp;or at least there is no distribution on it. However, in the DP, at some level, there is a distribution on the number of degrees of freedom and therefore more&amp;nbsp;parsimonious&amp;nbsp;models will be preferred (Actually, the DP is &lt;i&gt;not&lt;/i&gt;&amp;nbsp;a distribution on the number of parameters. It is a little bit more complicated than that. There is always an infinite number of parameters, but given the observations, just a finite number of them gets realized)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16672217-408791728022376445?l=blog.principiapredictiva.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.principiapredictiva.com/feeds/408791728022376445/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16672217&amp;postID=408791728022376445' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16672217/posts/default/408791728022376445'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16672217/posts/default/408791728022376445'/><link rel='alternate' type='text/html' href='http://blog.principiapredictiva.com/2010/10/math-test.html' title='Good reasons for choosing nonparametric priors'/><author><name>Daniel Acuna</name><uri>http://www.blogger.com/profile/03629337800175796305</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/_LZjPx_Q3_ss/SMftCC6ye4I/AAAAAAAAAAk/4Su4YrxB6uw/S220/daniel_library.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-S5IWO7so4ko/TfKWvhnj4zI/AAAAAAAAABo/UIEYL_VqDY0/s72-c/dist_components_fixedmixture.png' height='72' width='72'/><thr:total>2</thr:total></entry></feed>
