










Dan Gusfield,
Dean Hickerson and
Satish Eddhu. An efficiently computed lower bound on the number of recombinations in phylogenetic networks: Theory and empirical study. In DAM, Vol. 155(67):806830, 2007. Note: http://wwwcsif.cs.ucdavis.edu/~gusfield/cclowerbound.pdf.
Toggle abstract
"Phylogenetic networks are models of sequence evolution that go beyond trees, allowing biological operations that are not treelike. One of the most important biological operations is recombination between two sequences. An established problem [J. Hein, Reconstructing evolution of sequences subject to recombination using parsimony, Math. Biosci. 98 (1990) 185200; J. Hein, A heuristic method to reconstruct the history of sequences subject to recombination, J. Molecular Evoluation 36 (1993) 396405; Y. Song, J. Hein, Parsimonious reconstruction of sequence evolution and haplotype blocks: finding the minimum number of recombination events, in: Proceedings of 2003 Workshop on Algorithms in Bioinformatics, Berlin, Germany, 2003, Lecture Notes in Computer Science, Springer, Berlin; Y. Song, J. Hein, On the minimum number of recombination events in the evolutionary history of DNA sequences, J. Math. Biol. 48 (2003) 160186; L. Wang, K. Zhang, L. Zhang, Perfect phylogenetic networks with recombination, J. Comput. Biol. 8 (2001) 6978; S.R. Myers, R.C. Griffiths, Bounds on the minimum number of recombination events in a sample history, Genetics 163 (2003) 375394; V. Bafna, V. Bansal, Improved recombination lower bounds for haplotype data, in: Proceedings of RECOMB, 2005; Y. Song, Y. Wu, D. Gusfield, Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences, Bioinformatics 21 (2005) i413i422. Bioinformatics (Suppl. 1), Proceedings of ISMB, 2005, D. Gusfield, S. Eddhu, C. Langley, Optimal, efficient reconstruction of phylogenetic networks with constrained recombination, J. Bioinform. Comput. Biol. 2(1) (2004) 173213; D. Gusfield, Optimal, efficient reconstruction of rootunknown phylogenetic networks with constrained and structured recombination, J. Comput. Systems Sci. 70 (2005) 381398] is to find a phylogenetic network that derives an input set of sequences, minimizing the number of recombinations used. No efficient, general algorithm is known for this problem. Several papers consider the problem of computing a lower bound on the number of recombinations needed. In this paper we establish a new, efficiently computed lower bound. This result is useful in methods to estimate the number of needed recombinations, and also to prove the optimality of algorithms for constructing phylogenetic networks under certain conditions [D. Gusfield, S. Eddhu, C. Langley, Optimal, efficient reconstruction of phylogenetic networks with constrained recombination, J. Bioinform. Comput. Biol. 2(1) (2004) 173213; D. Gusfield, Optimal, efficient reconstruction of rootunknown phylogenetic networks with constrained and structured recombination, J. Comput. Systems Sci. 70 (2005) 381398; D. Gusfield, Optimal, efficient reconstruction of rootunknown phylogenetic networks with constrained recombination, Technical Report, Department of Computer Science, University of California, Davis, CA, 2004]. The lower bound is based on a structural, combinatorial insight, using only the site conflicts and incompatibilities, and hence it is fundamental and applicable to many biological phenomena other than recombination, for example, when gene conversions or recurrent or back mutations or crossspecies hybridizations cause the phylogenetic history to deviate from a tree structure. In addition to establishing the bound, we examine its use in more complex lower bound methods, and compare the bounds obtained to those obtained by other established lower bound methods. © 2006 Elsevier B.V. All rights reserved."









Dan Gusfield,
Vikas Bansal,
Vineet Bafna and
Yun S. Song. A Decomposition Theory for Phylogenetic Networks and Incompatible Characters. In JCB, Vol. 14(10):12471272, 2007. Keywords: explicit network, from sequences, galled tree, phylogenetic network, phylogeny, Program Beagle, Program GalledTree, recombination, reconstruction, software. Note: http://www.eecs.berkeley.edu/~yss/Pub/decomposition.pdf.



Yun S. Song,
Zhihong Ding,
Dan Gusfield,
Charles Langley and
Yufeng Wu. Algorithms to Distinguish the Role of GeneConversion from SingleCrossover Recombination in the Derivation of SNP Sequences in Populations. In JCB, Vol. 14(10):12731286, 2007. Keywords: ARG, from sequences, phylogenetic network, phylogeny, Program SHRUB, reconstruction. Note: http://dx.doi.org/10.1089/cmb.2007.0096.
Toggle abstract
"Meiotic recombination is a fundamental biological event and one of the principal evolutionary forces responsible for shaping genetic variation within species. In addition to its fundamental role, recombination is central to several critical applied problems. The most important example is "association mapping" in populations, which is widely hoped to help find genes that influence genetic diseases (Carlson et al., 2004; Clark, 2003). Hence, a great deal of recent attention has focused on problems of inferring the historical derivation of sequences in populations when both mutations and recombinations have occurred. In the algorithms literature, most of that recent work has been directed to singlecrossover recombination. However, geneconversion is an important, and more common, form of (twocrossover) recombination which has been much less investigated in the algorithms literature. In this paper, we explicitly incorporate geneconversion into discrete methods to study historical recombination. We are concerned with algorithms for identifying and locating the extent of historical crossingover and geneconversion (along with singlenucleotide mutation), and problems of constructing full putative histories of those events. The novel technical issues concern the incorporation of geneconversion into recently developed discrete methods (Myers and Griffiths, 2003; Song et al., 2005) that compute lower and upperbound information on the amount of needed recombination without geneconversion. We first examine the most natural extension of the lower bound methods from Myers and Griffiths (2003), showing that the extension can be computed efficiently, but that this extension can only yield weak lower bounds. We then develop additional ideas that lead to higher lower bounds, and show how to solve, via integerlinear programming, a more biologically realistic version of the lower bound problem. We also show how to compute effective upper bounds on the number of needed singlecrossovers and geneconversions, along with explicit networks showing a putative history of mutations, singlecrossovers and geneconversions. Both lower and upper bound methods can handle data with missing entries, and the upper bound method can be used to infer missing entries with high accuracy. We validate the significance of these methods by showing that they can be effectively used to distinguish simulationderived sequences generated without geneconversion from sequences that were generated with geneconversion. We apply the methods to recently studied sequences of Arabidopsis thaliana, identifying many more regions in the sequences than were previously identified (Plagnol et al., 2006), where geneconversion may have played a significant role. Demonstration software is available at www.csif.cs.ucdavis.edu/∼gusfield. © 2007 Mary Ann Liebert, Inc."












Yun S. Song,
Yufeng Wu and
Dan Gusfield. Efficient computation of close lower and upper bounds on the minimum number of recombinations in biological sequence evolution. In ISMB05, Vol. 21:i413i422 of BIO, 2005. Keywords: minimum number, Program HapBound, Program SHRUB, recombination. Note: http://dx.doi.org/10.1093/bioinformatics/bti1033.
Toggle abstract
"Motivation: We are interested in studying the evolution of DNA single nucleotide polymorphism sequences which have undergone (meiotic) recombination. For a given set of sequences, computing the minimum number of recombinations needed to explain the sequences (with one mutation per site) is a standard question of interest, but it has been shown to be NPhard, and previous algorithms that compute it exactly work either only on very small datasets or on problems with special structure. Results: In this paper, we present efficient, practical methods for computing both upper and lower bounds on the minimum number of needed recombinations, and for constructing evolutionary histories that explain the input sequences. We study in detail the efficiency and accuracy of these algorithms on both simulated and real data sets. The algorithms produce very close upper and lower bounds, which match exactly in a surprisingly wide range of data. Thus, with the use of new, very effective lower bounding methods and an efficient algorithm for computing upper bounds, this approach allows the efficient, exact computation of the minimum number of needed recombinations, with high frequency in a large range of data. When upper and lower bounds match, evolutionary histories found by our algorithm correspond to the most parsimonious histories. © The Author 2005. Published by Oxford University Press. All rights reserved."






Dan Gusfield and
Dean Hickerson. A Fundamental, EfficientlyComputed Lower Bound on the Number of Recombinations Needed in Phylogenetic Networks. 2004. Note: UC Davis Computer Science Technical Report.



Dan Gusfield,
Satish Eddhu and
Charles Langley. The fine structure of galls in phylogenetic networks. In INCOMP, Vol. 16(4):459469, 2004. Keywords: explicit network, from sequences, galled tree, phylogenetic network, phylogeny, reconstruction. Note: http://wwwcsif.cs.ucdavis.edu/~gusfield/informs.pdf.
Toggle abstract
"A phylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks (Posada and Crandall 2001, Schierup and Hein 2000). Wang et al. (2001) studied the problem of constructing a phylogenetic network for a set of n binary sequences derived from the allzero ancestral sequence, when each site in the sequence can mutate from zero to one at most once in the network, and recombination between sequences is allowed. They showed that the problem of minimizing the number of recombinations in such networks is NPhard, but introduced a special case of the problem, i.e., to determine whether the sequences could be derived on a phylogenetic network where the recombination cycles are nodedisjoint. Wang et al. (2001) provide a sufficient, but not a necessary test, for such solutions. Gusfield et al. (2003, 2004) gave a polynomialtime algorithm that is both a necessary and sufficient test. In this paper, we study in much more detail the fine combinatorial structure of nodedisjoint cycles in phylogenetic networks, both for purposes of insight into phylogenetic networks and to speed up parts of the previous algorithm. We explicitly characterize all the ways in which mutations can be arranged on a disjoint cycle, and prove a strong necessary condition for a set of mutations to be on a disjoint cycle. The main contribution here is to show how structure in the phylogenetic network is reflected in the structure of an efficientlycomputable graph, called the conflict graph. The success of this approach suggests that additional insight into the structure of phylogenetic networks can be obtained by exploring structural properties of the conflict graph."



Dan Gusfield,
Satish Eddhu and
Charles Langley. Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination. In JBCB, Vol. 2(1):173213, 2004. Keywords: explicit network, from sequences, galled tree, phylogenetic network, phylogeny, recombination, reconstruction. Note: http://wwwcsif.cs.ucdavis.edu/~gusfield/exfinalrec.pdf.
Toggle abstract
"A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. In a seminal paper, Wang et al.1 studied the problem of constructing a phylogenetic network, allowing recombination between sequences, with the constraint that the resulting cycles must be disjoint. We call such a phylogenetic network a "galledtree". They gave a polynomialtime algorithm that was intended to determine whether or not a set of sequences could be generated on galledtree. Unfortunately, the algorithm by Wang et al.1 is incomplete and does not constitute a necessary test for the existence of a galledtree for the data. In this paper, we completely solve the problem. Moreover, we prove that if there is a galledtree, then the one produced by our algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiplecrossover recombinations. We also prove that when there is a galledtree for the data, the galledtree minimizing the number of recombinations is "essentially unique" . We. also note two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation per site is allowed; second, the site compatibility problem (which is NPhard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree. Perhaps more important than the specific results about galledtrees, we introduce an approach that can be used to study recombination in general phylogenetic networks. This paper greatly extends the conference version that appears in an earlier work.8 PowerPoint slides of the conference talk can be found at our website. © Imperial College Press."











