I don't agree.
In this post, I will show how useful the approach may be in one of the simplest problems: mixture models. I also provide some code that you can access in Github.
Let's say that we want to model a bunch of observations as a finite mixture of Gaussians distributions. As we all known, one of the problems with this approach is that we have to pick how many components our mixture will have. This is an arbitrary choice and, as experience has shown me, the majority of my arbitrary choices is wrong. How can we let the data decide the number of components?
Here is when NPB statistics comes to the rescue. There is one very useful nonparametric object called the Dirichlet process (DP) that contains an infinite number of degrees of freedom and essentially allows the data to decide how many of them will be realized. I will not give a detailed introduction to the DP but you can read Erik Sudderth's thesis for an excellent one.
Well, the problem with the DP is that we have to approximate this infinite-dimensional thing by some means. One of the easiest approximations is based on the so-called Sethuraman's stick-breaking construction of the DP, which can be truncated, incurring in well-known approximation errors.
One of the confusions that, at the end, you also get a distribution over the number of components used, anyway. This is, whether we pick the number of components or we use some truncated DP, there will be a random variable $z_\text{unique}$ that will say how many components our $N$ data points are using. But, with the DP, the distribution on $z_\text{unique}$ does not depend on the truncation—except for the obvious cases.
For non-believers, I coded a little example.
Let say we have the following observations $x = \{10, 11, 12, -10, -11, -12 \}$. I hope these data make clear that there should be 2 mixture components, one with mean 11 and the other with mean -11.
I am using a nice Python package called PyMC to implement my models (actually, I am building a package myself that will contain several common Bayesian sub-models, such as mixtures and Markov chains, within the PyMC framework). You know, Python is a good programming environment with a healthy community, and it's free (although I don't know in what sense)
Let's say we compute the posterior of $z_\text{unique}$ in the finite mixture with different number of components ($K$) and the infinite mixture with different truncation levels ($K_\text{trunc}$). I will test $p_{\text{finite}, K = 5}(z_\text{unique} \mid x)$ and $p_{\text{finite}, K = 30}(z_\text{unique} \mid x)$ for the finite mixture. And $p_{\text{infinite}, K_\text{trunc} = 5}(z_\text{unique} \mid x)$ and $p_{\text{infinite}, K_\text{trunc} = 30}(z_\text{unique} \mid x)$.
![]() |
| Posterior on number of components used for finite mixture model with 10 points |
![]() |
| Posterior on number of components used for infinite mixture model with 10 points |
But why?
The explanation is based on the interpretation of Bayesian statistics as Occam's Razor. The fixed number of components on the finite mixture model does not have a Bayesian interpretation or at least there is no distribution on it. However, in the DP, at some level, there is a distribution on the number of degrees of freedom and therefore more parsimonious models will be preferred (Actually, the DP is not a distribution on the number of parameters. It is a little bit more complicated than that. There is always an infinite number of parameters, but given the observations, just a finite number of them gets realized)

