<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0">
<channel>
<title>Statistical Applications in Genetics and Molecular Biology</title>
<copyright>Copyright (c) 2009 Berkeley Electronic Press All rights reserved.</copyright>
<link>http://www.bepress.com/sagmb</link>
<description>Recent documents in Statistical Applications in Genetics and Molecular Biology</description>
<language>en-us</language>
<lastBuildDate>Thu, 05 Nov 2009 23:21:19 PST</lastBuildDate>
<ttl>3600</ttl>


	




<item>
<title>Statistical Screening Method for Genetic Factors Influencing Susceptibility to Common Diseases in a Two-Stage Genome-Wide Association Study</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art46</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art46</guid>
<pubDate>Wed, 04 Nov 2009 10:52:42 PST</pubDate>
<description>A genome-wide association study (GWAS) is a standard strategy for detecting disease susceptibility genes, despite unsettled controversies on many aspects, including optimal study design and statistical analysis. As for study design, a two-stage design has been applied to maximize cost-effectiveness. However, there has been little consensus on appropriate statistical analysis for two-stage design. Thereby perplexing the researchers as to which statistical measures should be applied at the first stage, and how to determine the significance level of the differences at the second stage. Here, using simulation studies, we compared statistical operating characteristics of the screening in a two-stage GWAS by taking into consideration the proper balance of false-positive and false-negative error. As a result, the lower bound of confidence interval for odds ratios is recommended as the first stage measure, and then the second stage criteria should primarily depend on the purpose of the genome screen or its role in the overall gene-hunting scheme. Based on the simulation study, we suggest rules of thumb about which statistics to use in a given situation. An application of all operating characteristics of the screening method to an actual GWAS for gastric cancer illustrates the practical relevance of our discussion.</description>

<author>Yasunori Sato</author>


<category>Genetics</category>

</item>


<item>
<title>A Regularized Regression Approach for Dissecting Genetic Conflicts that Increase Disease Risk in Pregnancy</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art45</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art45</guid>
<pubDate>Fri, 23 Oct 2009 13:42:53 PDT</pubDate>
<description>Human diseases developed during pregnancy could be caused by the direct effects of both maternal and fetal genes, and/or by the indirect effects caused by genetic conflicts. Genetic conflicts exist when the effects of fetal genes are opposed by the effects of maternal genes, or when there is a conflict between the maternal and paternal genes within the fetal genome. The two types of genetic conflicts involve the functions of different genes in different genomes and are genetically distinct. Differentiating and further dissecting the two sets of genetic conflict effects that increase disease risk during pregnancy present statistical challenges, and have been traditionally pursued as two separate endeavors. In this article, we develop a unified framework to model and test the two sets of genetic conflicts via a regularized regression approach. Our model is developed considering real situations in which the paternal information is often completely missing; an assumption that fails most of the current family-based studies. A mixture model-based penalized logistic regression is proposed for data sampled from a natural population. We develop a variable selection procedure to select significant genetic features. Simulation studies show that the model has high power and good false positive control under reasonable sample sizes and disease allele frequency. A case study of small for gestational age (SGA) is provided to show the utility of the proposed approach. Our model provides a powerful tool for dissecting genetic conflicts that increase disease risk during pregnancy, and offers a testable framework for the genetic conflict hypothesis previously proposed.</description>

<author>Shaoyu Li</author>


<category>Disease Modeling</category>

<category>Genetics</category>

<category>Statistical Models</category>

</item>


<item>
<title>Transmission Disequilibrium Test Power and Sample Size in the Presence of Locus Heterogeneity</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art44</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art44</guid>
<pubDate>Thu, 08 Oct 2009 19:54:47 PDT</pubDate>
<description>Locus heterogeneity is one of the most important issues in gene mapping and can cause significant reductions in statistical power for gene mapping, yet no research to date has provided power and sample size calculations for family-based association methods in the presence of locus heterogeneity. The purpose of this research is three-fold: (i) to provide an analytic solution to the incorporation of locus heterogeneity into power and sample size calculations for the TDT statistic; (ii) to verify our analytic solution with simulations; and (iii) to study how different factors affect sample size requirement for the TDT in the presence of locus heterogeneity. The detection of association in the presence of locus heterogeneity requires a greater sample size than in its absence. This increase is independent of the prevalence of the disease. In addition, as the proportion of families unlinked to the disease locus increases, the sample size necessary to maintain constant power increases. Finally, as the effect size of the disease locus increases, the sample size necessary to detect association decreases in the presence of locus heterogeneity. We provide freely available software that can perform these calculations.</description>

<author>Chuanwen Chen</author>


<category>Computation</category>

<category>Design of Experiments and Sample Surveys</category>

<category>Genetics</category>

<category>Statistical Theory and Methods</category>

</item>


<item>
<title>Characterizing the D2 Statistic: Word Matches in Biological Sequences</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art43</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art43</guid>
<pubDate>Thu, 08 Oct 2009 13:24:50 PDT</pubDate>
<description>Word matches are often used in sequence comparison methods, either as a measure of sequence similarity or in the first search steps of algorithms such as BLAST or BLAT.  The D2 statistic is the number of matches of words of k letters between two sequences.  Recent advances have been made in the characterization of this statistic and in the approximation of its distribution.  Here, these results are extended to the case of approximate word matches.We compute the exact value of the variance of the D2 statistic for the case of a uniform letter distribution, and introduce a method to provide accurate approximations of the variance in the remaining cases.  This enables the distribution of D2 to be approximated for typical situations arising in biological research.  We apply these results to the identification of cis-regulatory modules, and show that this method detects such sequences with a high accuracy.The ability to approximate the distribution of D2 for both exact and approximate word matches will enable the use of this statistic in a more precise manner for sequence comparison, database searches, and identification of transcription factor binding sites.</description>

<author>Sylvain Forêt</author>


<category>Computation</category>

<category>Computational Biology/Bioinformatics</category>

<category>Statistical Theory and Methods</category>

</item>


<item>
<title>MC-Normalization: A Novel Method for Dye-Normalization of Two-Channel Microarray Data</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art42</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art42</guid>
<pubDate>Thu, 01 Oct 2009 17:02:53 PDT</pubDate>
<description>Pre-processing plays a vital role in two-color microarray data analysis. An analysis is characterized by its ability to identify differentially expressed genes (its sensitivity) and its ability to provide unbiased estimators of the true regulation (its bias). It has been shown that microarray experiments regularly underestimate the true regulation of differentially expressed genes. We introduce the MC-normalization, where C stands for channel-wise normalization, with considerably lower bias than the commonly used standard methods. The idea behind the MC-normalization is that the channels' individual intensities determine the correction, rather than the average intensity which is the case for the widely used MA-normalization. The two methods were evaluated using spike-in data from an in-house produced cDNA-experiment and a publicly available Agilent-experiment. The methods were applied on background corrected and non-background corrected data. For the cDNA-experiment the methods were either applied separately on data from each of the print-tips or applied on the complete array data. Altogether 24 analyses were evaluated. For each analysis the sensitivity, the bias and two variance measures were estimated.We prove that the MC-normalization has lower bias than the MA-normalization. The spike-in data confirmed the theoretical result and suggest that the difference is significant. Furthermore, the empirical data suggest that the MC-and MA-normalization have similar sensitivity. A striking result is that print-tip normalizations did have considerably higher sensitivity than analyses using the complete array data.</description>

<author>Mattias Landfors</author>


<category>Computational Biology/Bioinformatics</category>

<category>Microarrays</category>

<category>Statistical Models</category>

</item>


<item>
<title>M-quantile Regression Analysis of Temporal Gene Expression Data</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art41</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art41</guid>
<pubDate>Tue, 22 Sep 2009 09:43:02 PDT</pubDate>
<description>In this paper, we explore the use of M-quantile regression and M-quantile coefficients to detect statistical differences between temporal curves that belong to different experimental conditions. In particular, we consider the application of temporal gene expression data. Here, the aim is to detect genes whose temporal expression is significantly different across a number of biological conditions.  We present a new method to approach this problem. Firstly, the temporal profiles of the genes are modelled by a parametric M-quantile regression model. This model is particularly appealing to small-sample gene expression data, as it is very robust against outliers and it does not make any assumption on the error distribution. Secondly, we further increase the robustness of the method by summarising the M-quantile regression models for a large range of quantile values into an M-quantile coefficient. Finally, we fit a polynomial M-quantile regression model to the M-quantile coefficients over time and employ a Hotelling T2-test to detect significant differences of the temporal M-quantile coefficients profiles across conditions. Extensive simulations show the increased power and robustness of M-quantile regression methods over standard regression methods and over some of the previously published methods. We conclude by applying the method to detect differentially expressed genes from time-course microarray data on muscular dystrophy.</description>

<author>Veronica Vinciotti</author>


<category>General Biostatistics</category>

<category>Microarrays</category>

<category>Statistical Models</category>

</item>


<item>
<title>Modeling Dependence in Methylation Patterns with Application to Ovarian Carcinomas</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art40</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art40</guid>
<pubDate>Tue, 22 Sep 2009 09:42:58 PDT</pubDate>
<description>Changes in cytosine methylation at CpG nucleotides are observed in many cancers and offer great potential for translational research. Diseases such as ovarian cancer that are especially challenging to diagnose and treat are of particular interest, and abnormal methylation in the tandem repeats Sat2 and NBL2 has been observed in a collection of ovarian carcinomas.  In earlier analyses of double-stranded methylation patterns in 0.2 kb regions of Sat2 and NBL2, we detected clusters of identically methylated sites in close proximity.  These clusters could not be explained by random variation, and our findings suggested a high degree of site-to-site dependence. However, previously developed stochastic models for methylation change have either treated CpG sites independently or employed a context dependent approach to adjust model parameters according to regional methylation levels. In this paper, we introduce a novel neighboring sites model as an alternative methodology for considering dependence in methylation patterns, and we compare the three models in their ability to generate simulated sequences statistically similar to our Sat2 and NBL2 carcinoma samples.</description>

<author>Michelle R. Lacey</author>


<category>Statistical Models</category>

</item>


<item>
<title>Calculating Asymptotic Significance Levels of the Constrained Likelihood Ratio Test with Application to Multivariate Genetic Linkage Analysis</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art39</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art39</guid>
<pubDate>Thu, 17 Sep 2009 19:14:42 PDT</pubDate>
<description>The asymptotic distribution of the multivariate variance component linkage analysis likelihood ratio test has provoked some contradictory accounts in the literature. In this paper we confirm that some previous results are not correct by deriving the asymptotic distribution in one special case. It is shown that this special case is a good approximation to the distribution in many situations. We also introduce a new approach to simulating from the asymptotic distribution of the likelihood ratio test statistic in constrained testing problems. It is shown that this method is very efficient for small p-values, and is applicable even when the constraints are not convex. The method is related to a multivariate integration problem. We illustrate how the approach can be applied to multivariate linkage analysis in a simulation study. Some more philosophical issues relating to one-sided tests in variance components linkage analysis are discussed.</description>

<author>Nathan J. Morris</author>


<category>Genetics</category>

</item>


<item>
<title>A Statistical Model for Genetic Mapping of Viral Infection by Integrating Epidemiological Behavior</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art38</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art38</guid>
<pubDate>Wed, 09 Sep 2009 13:56:14 PDT</pubDate>
<description>Large-scale studies of genetic variation may be helpful for understanding the genetic control mechanisms of viral infection and, ultimately, predicting and eliminating infectious disease outbreaks. We propose a new statistical model for detecting specific DNA sequence variants that are responsible for viral infection. This model considers additive, dominance and epistatic effects of haplotypes from three different genomes, recipient, transmitter and virus, through an epidemiological process. The model is constructed within the maximum likelihood framework and implemented with the EM algorithm. A number of hypothesis tests about population genetic structure and diversity and the pattern of genetic control are formulated. A series of closed forms for the EM algorithm to estimate haplotype frequencies and haplotype effects in a network of genetic interactions among three genomes are derived. Simulation studies were performed to test the statistical properties of the model, recommending necessary sample sizes for obtaining reasonably good accuracy and precision of parameter estimation. By integrating, for the first time, the epidemiological principle of viral infection into genetic mapping, the new model shall find an immediate application to studying the genetic architecture of viral infection.</description>

<author>Yao Li</author>


<category>Genetics</category>

</item>


<item>
<title>Identifying Individuals in a Complex Mixture of DNA with Unknown Ancestry</title>
<link>http://www.bepress.com/sagmb/vol8/iss1/art37</link>
<guid isPermaLink="true">http://www.bepress.com/sagmb/vol8/iss1/art37</guid>
<pubDate>Wed, 09 Sep 2009 13:56:11 PDT</pubDate>
<description>A new test was recently developed that could use a high-density set of single nucleotide polymorphisms (SNPs) to determine whether a specific individual contributed to a mixture of DNA. The test statistic compared the genotype for the individual to the allele frequencies in the mixture and to the allele frequencies in a reference group. This test requires the ancestries of the reference group to be nearly identical to those of the contributors to the mixture. Here, we first quantify the bias, the increase in type I and type II error, when the ancestries are not well matched. Then, we show that the test can also be biased if the number of subjects in the two groups differ or if the platforms used to measure SNP intensities differ. We then introduce a new test statistic and a test that only requires the ancestries of the reference group to be similar to the individual of interest, and show that this test is not only robust to the number of subjects and platform, but also has increased power of detection. The two tests are compared on both HapMap and simulated data.</description>

<author>Joshua Sampson</author>


<category>Computational Biology/Bioinformatics</category>

<category>Genetics</category>

</item>



</channel>
</rss>
