Sankhya A

# Concentration in the Generalized Chinese Restaurant Process – 2018

with A. Pereira (UFAL) and R. I. Oliveira (IMPA)

Abstract

The Generalized Chinese Restaurant Process (GCRP) describes a sequence of exchangeable random partitions of the numbers $\{1,\dots,n\}$. This process is related to the Ewens sampling model in Genetics and to Bayesian nonparametric methods such as topic models. In this paper, we study the GCRP in a regime where the number of parts grows like $n^{\alpha}$ with α>0. We prove a non-asymptotic concentration result for the number of parts of size $k=o(n^{\alpha / (2\alpha+4)}/(\log n)^{1/(2+\alpha)})$. In particular, we show that these random variables concentrate around $c_kV_{*}n^{\alpha}$ where $V_{*}n^{\alpha}$ is the asymptotic number of parts and $c_k\approx k^{-(1+\alpha)}$ is a positive value depending on k. We also obtain finite-n bounds for the total number of parts. Our theorems complement asymptotic statements by Pitman and more recent results on large and moderate deviations by Favaro, Feng and Gao