Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- Confidence Intervals for Negative Binomial Random Variables of High Dispersion
-
-
Download the Paper
Forward to a colleague
- Abstract:
- This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, we traditionally rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence to the Normal in distribution as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) that construct confidence intervals for the mean will typically be too narrow and significantly undercover in the case of high dispersion. To address this problem, we rely upon confidence intervals constructed from Bernstein's inequality as an alternative to standard methods when the sample size is small and the dispersion is high. We also propose and provide empirical evidence for a Chi Square model as an approximate distribution for the sample mean of Negative Binomial random variables of high dispersion when the mean and sample size are small. This Chi Square model leads directly to an alternative method for constructing confidence intervals in this setting. We subsequently prove a limit theorem demonstrating that the sample mean converges in distribution to a Gamma random variable, of which the Chi Square distribution is a special case. We then undertake a variety of simulation experiments to compare the proposed methods to standard techniques in terms of empirical coverage and provide concrete recommendations for the settings in which particular intervals are preferred. We subsequently conduct a sensitivity analysis of the choice of the upper bound in Bernstein confidence intervals that may serve as an avenue for improving the coverage of this method at extreme degrees of dispersion and very small sample sizes. We also apply the proposed methods to examples arising in the serial analysis of gene expression and traffic flow in a communications network to illustrate both the strengths and weaknesses of these procedures along with those of standard techniques.
- Subject Area:
- General Biostatistics, Statistical Models, Statistical Theory and Methods
- Suggested Citation:
- David Shilane, Alan E. Hubbard, and S N. Evans,
"Confidence Intervals for Negative Binomial Random Variables of High Dispersion"
(August 2008).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 242.
http://www.bepress.com/ucbbiostat/paper242