Search
- Browse Authors in the U.C. Berkeley Division of Biostatistics Working Paper Series
Notification
Most popular papers
COBRA Notification
Most Popular Papers
Institutions: Join COBRA
About COBRA
- Quantification and Visualization of LD Patterns and Identification of Haplotype Blocks
-
-
Download the Paper
Forward to a colleague
- Abstract:
- Classical measures of linkage disequilibrium (LD) between two loci,
based only on the joint distribution of alleles at these loci, present
noisy patterns. In this paper, we propose a new
distance-based LD measure, R, which takes into account multilocus
haplotypes around the two loci in order to exploit information from
neighboring loci. The LD measure R yields a matrix of pairwise
distances between markers, based on the correlation between the lengths
of shared haplotypes among chromosomes around these markers. Data analysis
demonstrates that visualization of LD patterns through the R matrix
reveals more deterministic patterns, with much less noise, than using
classical LD measures. Moreover, the patterns are highly compatible with
recently suggested models of haplotype block structure. We propose to
apply the new LD measure to define haplotype blocks through cluster
analysis. Specifically, we present a distance-based clustering
algorithm, DHPBlocker, which performs hierarchical
partitioning of an ordered sequence of markers into disjoint and
adjacent blocks with a hierarchical structure. The proposed method
integrates information on the two main existing criteria in
defining haplotype blocks, namely, LD and haplotype diversity, through
the use of silhouette width and description length as cluster validity
measures, respectively. The new LD measure and clustering procedure
are applied to single nucleotide polymorphism (SNP) datasets from the
human 5q31 region (Daly et al. 2001) and the class II region of the
human major histocompatibility complex (Jeffreys et al. 2001). Our
results are in good agreement with published results. In addition,
analyses performed on different subsets of
markers indicate that the method is robust with regards to the allele
frequency and density of the genotyped markers. Unlike previously
proposed methods, our new cluster-based method can uncover
hierarchical relationships among blocks and can be applied to
polymorphic DNA markers or amino acid sequence data.
- Subject Area:
- Human Genetics, Multivariate Analysis, Statistical Theory and Methods
- Suggested Citation:
- Yan Wang and Sandrine Dudoit,
"Quantification and Visualization of LD Patterns and Identification of Haplotype Blocks"
(June 2004).
U.C. Berkeley Division of Biostatistics Working Paper Series.
Working Paper 150.
http://www.bepress.com/ucbbiostat/paper150