The International Journal of Biostatistics Copyright (c) 2009 Berkeley Electronic Press All rights reserved. http://www.bepress.com/ijb Recent documents in The International Journal of Biostatistics en-us Thu, 02 Jul 2009 11:29:24 PDT 3600 Semiparametrically Efficient Estimation of Conditional Instrumental Variables Parameters http://www.bepress.com/ijb/vol5/iss1/22 http://www.bepress.com/ijb/vol5/iss1/22 Tue, 30 Jun 2009 19:24:51 PDT In this paper, I propose a set of parameters designed to identify the slope of structural relationships based on a combination of conditioning on covariates and the use of an exogenous instrument. After giving structural interpretations to these parameters in the context of specific semiparametric models, I derive their efficient influence curves in a fully nonparametric context as well as under imposition of restrictions on the instrument. These influence curves give the semiparametric efficiency bounds for regular asymptotically linear estimators of the parameters and allow the construction of asymptotically efficient estimators. Monte Carlo experiments finally demonstrate the good finite sample performance of such estimators. Maximilian Kasy Statistical Theory and Methods Mixed-Effects Poisson Regression Models for Meta-Analysis of Follow-Up Studies with Constant or Varying Durations http://www.bepress.com/ijb/vol5/iss1/21 http://www.bepress.com/ijb/vol5/iss1/21 Fri, 26 Jun 2009 11:52:46 PDT We present a framework for meta-analysis of follow-up studies with constant or varying duration using the binary nature of the data directly. We use a generalized linear mixed model framework with the Poisson likelihood and the log link function. We fit models with fixed and random study effects using Stata for performing meta-analysis of follow-up studies with constant or varying duration. The methods that we present are capable of estimating all the effect measures that are widely used in such studies such as the Risk Ratio, the Risk Difference (in case of studies with constant duration), as well as the Incidence Rate Ratio and the Incidence Rate Difference (for studies of varying duration). The methodology presented here naturally extends previously published methods for meta-analysis of binary data in a generalized linear mixed model framework using the Poisson likelihood. Simulation results suggest that the method is uniformly more powerful compared to summary based methods, in particular when the event rate is low and the number of studies is small. The methods were applied in several already published meta-analyses with very encouraging results. The methods are also directly applicable to individual patients' data offering advanced options for modeling heterogeneity and confounders. Extensions of the models for more complex situations, such as competing risks models or recurrent events are also discussed. The methods can be implemented in standard statistical software and illustrative code in Stata is given in the appendix. Pantelis G. Bagos Clinical Epidemiology Clinical Trials Epidemiology General Biostatistics Multivariate Analysis Statistical Models Optimal Sufficient Statistics for Parametric and Non-Parametric Multiple Simultaneous Hypothesis Testing http://www.bepress.com/ijb/vol5/iss1/20 http://www.bepress.com/ijb/vol5/iss1/20 Tue, 23 Jun 2009 13:45:27 PDT In multiple simultaneous hypothesis testing (MSHT), a significance thresholding function as a scalar statistic can be designed in an adaptive manner by sharing information among many tests performed simultaneously. By using such an adapted statistic, MSHT has greater detection power than tests using simple individual statistics. To systematically obtain an optimal thresholding function that maximizes the detection power in MSHT, Storey (2007) proposed a theoretical framework called the optimal discovery procedure (ODP). He also proposed an empirical estimation of the ODP thresholding function for a parametric MSHT that presupposes parametric forms of the null and alternative likelihood functions. Empirical Bayesian testing (Efron et al. 2001), which is based on a non-parametric treatment of arbitrary test statistics, has sometimes exhibited comparable power to the ODP. These two MSHT frameworks appear to be closely related but, because of differences in their approach (frequentist vs. Bayesian), the relationship is not well understood.We present the new concept of an optimal sufficient statistic that links the ODP and empirical Bayesian frameworks, and we show that the local false discovery rate based on the empirical Bayes can be an optimal thresholding function if a certain condition holds. We lay out exhaustive sets of presumptions to achieve optimal thresholding functions and show that, if an optimal thresholding function is derived for a parametric MSHT problem, it is still optimal for a more general and broader range of MSHT problems defined in a non- or semi-parametric way. A guide to designing optimal thresholding functions for general MSHT problems is thus provided by our study. Shigeyuki Oba Microarrays Statistical Theory and Methods A Simulation Study of the Validity and Efficiency of Design-Adaptive Allocation to Two Groups in the Regression Situation http://www.bepress.com/ijb/vol5/iss1/19 http://www.bepress.com/ijb/vol5/iss1/19 Fri, 29 May 2009 12:08:48 PDT Dynamic allocation of participants to treatments in a clinical trial has been an alternative to randomization for nearly 35 years. Design-adaptive allocation is a particularly flexible kind of dynamic allocation. Every investigation of dynamic allocation methods has shown that they improve balance of prognostic factors across treatment groups, but there have been lingering doubts about their influence on the validity of statistical inferences. Here we report the results of a simulation study focused on this and similar issues. Overall, it is found that there are no statistical reasons, in the situations studied, to prefer randomization to design-adaptive allocation. Specifically, there is no evidence of bias, the number of participants wasted by randomization in small studies is not trivial, and when the aim is to place bounds on the prediction of population benefits, randomization is quite substantially less efficient than design-adaptive allocation. A new, adjusted permutation estimate of the standard deviation of the regression estimator under design-adaptive allocation is shown to be an unbiased estimate of the true sampling standard deviation, resolving a long-standing problem with dynamic allocations. These results are shown in situations with varying numbers of balancing factors, different treatment and covariate effects, different covariate distributions, and in the presence of a small number of outliers. Mikel Aickin Clinical Trials Likelihood Estimation of Conjugacy Relationships in Linear Models with Applications to High-Throughput Genomics http://www.bepress.com/ijb/vol5/iss1/18 http://www.bepress.com/ijb/vol5/iss1/18 Fri, 29 May 2009 12:08:42 PDT In the simultaneous estimation of a large number of related quantities, multilevel models provide a formal mechanism for efficiently making use of the ensemble of information for deriving individual estimates. In this article we investigate the ability of the likelihood to identify the relationship between signal and noise in multilevel linear mixed models. Specifically, we consider the ability of the likelihood to diagnose conjugacy or independence between the signals and noises. Our work was motivated by the analysis of data from high-throughput experiments in genomics. The proposed model leads to a more flexible family. However, we further demonstrate that adequately capitalizing on the benefits of a well fitting fully-specified likelihood in the terms of gene ranking is difficult. Brian S. Caffo Genetics Measuring Agreement about Ranked Decision Choices for a Single Subject http://www.bepress.com/ijb/vol5/iss1/17 http://www.bepress.com/ijb/vol5/iss1/17 Thu, 28 May 2009 10:49:50 PDT Introduction. When faced with a medical classification, clinicians often rank-order the likelihood of potential diagnoses, treatment choices, or prognoses as a way to focus on likely occurrences without dropping rarer ones from consideration. To know how well clinicians agree on such rankings might help extend the realm of clinical judgment farther into the purview of evidence-based medicine. If rankings by different clinicians agree better than chance, the order of assignments and their relative likelihoods may justifiably contribute to medical decisions. If the agreement is no better than chance, the ranking should not influence the medical decision.  Background. Available rank-order methods measure agreement over a set of decision choices by two rankers or by a set of rankers over two choices (rank correlation methods), or an overall agreement over a set of choices by a set of rankers (Kendall's W), but will not measure agreement about a single decision choice across a set of rankers. Rating methods (e.g. kappa) assign multiple subjects to nominal categories rather than ranking possible choices about a single subject and will not measure agreement about a single decision choice across a set of rankers. Method. In this article, we pose an agreement coefficient A for measuring agreement among a set of clinicians about a single decision choice and compare several potential forms of A. A takes on the value 0 when agreement is random and 1 when agreement is perfect. It is shown that A = 1 - observed disagreement/maximum disagreement. A particular form of A is recommended and tables of 5% and 10% significant values of A are generated for common numbers of ranks and rankers. Examples. In the selection of potential treatment assignments by a Tumor Board to a patient with a neck mass, there is no significant agreement about any treatment. Another example involves ranking decisions about a proposed medical research protocol by an Institutional Review Board (IRB). The decision to pass a protocol with minor revisions shows agreement at the 5% significance level, adequate for a consistent decision. Robert H. Riffenburgh Statistical Theory and Methods Modelling and Assessing Differential Gene Expression Using the Alpha Stable Distribution http://www.bepress.com/ijb/vol5/iss1/16 http://www.bepress.com/ijb/vol5/iss1/16 Wed, 13 May 2009 12:00:46 PDT After normalization, the distribution of gene expressions for very different organisms have a similar shape, usually exhibit heavier tails than a Gaussian distribution, and have a certain degree of asymmetry. Therefore, this distribution has been modeled in the literature using different parametric families of distributions, such the Asymmetric Laplace or the Cauchy distribution. Moreover, it is known that the tails of spot-intensity distributions are described by a power law and the variance of a given array increases with the number of genes. These features of the distribution of gene expression strongly suggest that the alpha-stable distribution is suitable to model it.In this work, we model the error distribution for gene expression data using the alpha-stable distribution. This distribution is tested successfully for four different datasets. The Kullback-Leibler, Chi-square and Hellinger tests are performed to compare how alpha-stable, Asymmetric Laplace and Gaussian fit the spot intensity distribution. The alpha-stable is proved to perform much better for every array in every dataset considered.Furthermore, using an alpha-stable mixture model, a Bayesian log-posterior odds is calculated allowing us to decide whether a gene is differently expressed or not. This statistic is based on the Scale Mixture of Normals and other well known properties of the alpha-stable distribution. The proposed methodology is illustrated using simulated data and the results are compared with the other existing statistical approach. Diego Salas-Gonzalez Microarrays Statistical Models Power for Testing Multiple Instances of the Two One-Sided Tests Procedure http://www.bepress.com/ijb/vol5/iss1/15 http://www.bepress.com/ijb/vol5/iss1/15 Thu, 07 May 2009 13:47:21 PDT The two one-sided tests procedure is used to test the equivalence of two measurements taken under different conditions. For example, two formulations of a drug are said to be bioequivalent if the average blood levels of the drug over time (AUC) are similar for the two formulations. In some studies there may be more than one parameter to test, such as a drug's AUC and maximum concentration, Cmax, or AUCs from a parent drug and a metabolite. The power of testing two or more equivalence hypotheses simultaneously is less than the power to test any one hypothesis separately, and depends on the correlations of the measurements. This paper develops an exact mathematical formula for the power for two or more simultaneous comparisons for normally distributed variables when several comparisons are evaluated separately. The formula requires numerical integration with respect to the variance-covariance terms. These terms are distributed according to the Wishart distribution, and are integrated over a subset of positive-definite matrices defined by the equivalence criteria. An R program for the case of two comparisons is included. Kem F. Phillips Clinical Trials General Biostatistics A Nearly Exhaustive Search for CpG Islands on Whole Chromosomes http://www.bepress.com/ijb/vol5/iss1/14 http://www.bepress.com/ijb/vol5/iss1/14 Thu, 07 May 2009 13:47:16 PDT CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram and explicitly show that filtering criteria-based definitions of CpG islands are mathematically incomplete and non-operational. These facts imply that the sliding window methods frequently fail to identify a large percentage of subsequences that meet the filtering criteria. We also demonstrate that an exhaustive search is computationally expensive. We develop the Hierarchical Factor Segmentation (HFS) algorithm, a pattern recognition technique with an adaptive model selection device to overcome the incompleteness and non-operational drawbacks, and to achieve effective computations for identifying CpG-islands. The concept of a CpG island “core" is introduced and computed using the HFS algorithm, which is independent from any specific filtering criteria. Upon such a CpG island “core," a CpG-island is constructed using a Lexis diagram. This two-step computational approach provides a nearly exhaustive search for CpG islands that can be practically implemented on whole chromosomes. In a simulation study realistically mimicking CpG-island dynamics through a Hidden Markov Model we demonstrate that this approach retains very high sensitivity and specificity, that is, very low rates of false positives and false negatives. Finally, we apply the HFS algorithm to identify CpG island cores on human chromosome 21. Fushing Hsieh Computational Biology/Bioinformatics Type I Error Rates, Coverage of Confidence Intervals, and Variance Estimation in Propensity-Score Matched Analyses http://www.bepress.com/ijb/vol5/iss1/13 http://www.bepress.com/ijb/vol5/iss1/13 Tue, 14 Apr 2009 11:00:03 PDT Propensity-score matching is frequently used in the medical literature to reduce or eliminate the effect of treatment selection bias when estimating the effect of treatments or exposures on outcomes using observational data. In propensity-score matching, pairs of treated and untreated subjects with similar propensity scores are formed. Recent systematic reviews of the use of propensity-score matching found that the large majority of researchers ignore the matched nature of the propensity-score matched sample when estimating the statistical significance of the treatment effect. We conducted a series of Monte Carlo simulations to examine the impact of ignoring the matched nature of the propensity-score matched sample on Type I error rates, coverage of confidence intervals, and variance estimation of the treatment effect. We examined estimating differences in means, relative risks, odds ratios, rate ratios from Poisson models, and hazard ratios from Cox regression models. We demonstrated that accounting for the matched nature of the propensity-score matched sample tended to result in type I error rates that were closer to the advertised level compared to when matching was not incorporated into the analyses. Similarly, accounting for the matched nature of the sample tended to result in confidence intervals with coverage rates that were closer to the nominal level, compared to when matching was not taken into account. Finally, accounting for the matched nature of the sample resulted in estimates of standard error that more closely reflected the sampling variability of the treatment effect compared to when matching was not taken into account. Peter C. Austin General Biostatistics Health Services Research