


HansJürgen Bandelt,
Vincent Macaulay and
Martin Richards. Median networks: speedy construction and greedy reduction, one simulation, and two case studies from human mtDNA. In MPE, Vol. 16:828, 2000. Keywords: from sequences, from splits, median network, phylogenetic network, phylogeny, reconstruction. Note: http://www.stats.gla.ac.uk/~vincent/papers/speedy.pdf.
Toggle abstract
"Molecular data sets characterized by few phylogenetically informative characters with a broad spectrum of mutation rates, such as intraspecific controlregion sequence variation of human mitochondrial DNA (mtDNA), can be usefully visualized in the form of median networks. Here we provide a stepbystep guide to the construction of such networks by hand. We improve upon a previously implemented algorithm by outlining an efficient parametrized strategy amenable to large data sets, greedy reduction, which makes it possible to reconstruct some of the confounding recurrent mutations. This entails some postprocessing as well, which assists in capturing more parsimonious solutions. To simplify the creation of the resulting network by hand, we describe a speedy approach to network construction, based on a careful planning of the processing order. A coalescent simulation tailored to human mtDNA variation in Eurasia testifies to the usefulness of reduced median networks, while highlighting notorious problems faced by all phylogenetic methods in this context. Finally, we discuss two case studies involving the comparison of characters in the two hypervariable segments of the human mtDNA control region in the light of the worldwide controlregion sequence database, as well as additional restriction fragment length polymorphism information. We conclude that only a minority of the mutations that hit the second segment occur at sites that would have a mutation rate comparable to those at most sites in the first segment. Discarding the known 'noisy' sites of the second segment enhances the analysis. (C) 2000 Academic Press."



Mark Clement,
David Posada and
Keith A. Crandall. TCS: a computer program to estimate gene genealogies. In MOLE, Vol. 9:16571659, 2000. Keywords: from sequences, parsimony, phylogenetic network, phylogeny, Program TCS, reconstruction, software, statistical parsimony. Note: http://darwin.uvigo.es/download/papers/08.tcs00.pdf.
Toggle abstract
[No abstract available]



Dan Gusfield,
Satish Eddhu and
Charles Langley. Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination. In JBCB, Vol. 2(1):173213, 2004. Keywords: explicit network, from sequences, galled tree, phylogenetic network, phylogeny, recombination, reconstruction. Note: http://wwwcsif.cs.ucdavis.edu/~gusfield/exfinalrec.pdf.
Toggle abstract
"A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. In a seminal paper, Wang et al.1 studied the problem of constructing a phylogenetic network, allowing recombination between sequences, with the constraint that the resulting cycles must be disjoint. We call such a phylogenetic network a "galledtree". They gave a polynomialtime algorithm that was intended to determine whether or not a set of sequences could be generated on galledtree. Unfortunately, the algorithm by Wang et al.1 is incomplete and does not constitute a necessary test for the existence of a galledtree for the data. In this paper, we completely solve the problem. Moreover, we prove that if there is a galledtree, then the one produced by our algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiplecrossover recombinations. We also prove that when there is a galledtree for the data, the galledtree minimizing the number of recombinations is "essentially unique" . We. also note two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation per site is allowed; second, the site compatibility problem (which is NPhard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree. Perhaps more important than the specific results about galledtrees, we introduce an approach that can be used to study recombination in general phylogenetic networks. This paper greatly extends the conference version that appears in an earlier work.8 PowerPoint slides of the conference talk can be found at our website. © Imperial College Press."



Dan Gusfield,
Satish Eddhu and
Charles Langley. The fine structure of galls in phylogenetic networks. In INCOMP, Vol. 16(4):459469, 2004. Keywords: explicit network, from sequences, galled tree, phylogenetic network, phylogeny, reconstruction. Note: http://wwwcsif.cs.ucdavis.edu/~gusfield/informs.pdf.
Toggle abstract
"A phylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks (Posada and Crandall 2001, Schierup and Hein 2000). Wang et al. (2001) studied the problem of constructing a phylogenetic network for a set of n binary sequences derived from the allzero ancestral sequence, when each site in the sequence can mutate from zero to one at most once in the network, and recombination between sequences is allowed. They showed that the problem of minimizing the number of recombinations in such networks is NPhard, but introduced a special case of the problem, i.e., to determine whether the sequences could be derived on a phylogenetic network where the recombination cycles are nodedisjoint. Wang et al. (2001) provide a sufficient, but not a necessary test, for such solutions. Gusfield et al. (2003, 2004) gave a polynomialtime algorithm that is both a necessary and sufficient test. In this paper, we study in much more detail the fine combinatorial structure of nodedisjoint cycles in phylogenetic networks, both for purposes of insight into phylogenetic networks and to speed up parts of the previous algorithm. We explicitly characterize all the ways in which mutations can be arranged on a disjoint cycle, and prove a strong necessary condition for a set of mutations to be on a disjoint cycle. The main contribution here is to show how structure in the phylogenetic network is reflected in the structure of an efficientlycomputable graph, called the conflict graph. The success of this approach suggests that additional insight into the structure of phylogenetic networks can be obtained by exploring structural properties of the conflict graph."



Jotun Hein. A heuristic method to reconstruct the history of sequences subject to recombination. In JME, Vol. 36(4):396405, 1993. Keywords: explicit network, from sequences, heuristic, parsimony, phylogenetic network, phylogeny, Program RecPars, recombination, recombination detection, software. Note: http://dx.doi.org/10.1007/BF00182187.





Cam Thach Nguyen,
Nguyen Bao Nguyen,
WingKin Sung and
Louxin Zhang. Reconstructing Recombination Network from Sequence Data: The Small Parsimony Problem. In TCBB, Vol. 4(3):394402, 2007. Keywords: explicit network, from sequences, labeling, NP complete, parsimony, phylogenetic network, phylogeny. Note: http://www.cs.washington.edu/homes/ncthach/Papers/TCBB2007.pdf.



Alan R. Templeton,
Keith A. Crandall and
Charles F. Sing. A Cladistic Analysis of Phenotypic Associations With Haplotypes Inferred From Restriction Endonuclease Mapping and DNA Sequence Data. III. Cladogram Estimation. In GEN, Vol. 132:619633, 2000. Keywords: from sequences, parsimony, phylogenetic network, phylogeny, Program TCS, recombination, reconstruction, statistical parsimony. Note: http://www.genetics.org/cgi/content/abstract/132/2/619.



Dan Gusfield,
Vikas Bansal,
Vineet Bafna and
Yun S. Song. A Decomposition Theory for Phylogenetic Networks and Incompatible Characters. In JCB, Vol. 14(10):12471272, 2007. Keywords: explicit network, from sequences, galled tree, phylogenetic network, phylogeny, Program Beagle, Program GalledTree, recombination, reconstruction, software. Note: http://www.eecs.berkeley.edu/~yss/Pub/decomposition.pdf.





Yun S. Song,
Zhihong Ding,
Dan Gusfield,
Charles Langley and
Yufeng Wu. Algorithms to Distinguish the Role of GeneConversion from SingleCrossover Recombination in the Derivation of SNP Sequences in Populations. In JCB, Vol. 14(10):12731286, 2007. Keywords: ARG, from sequences, phylogenetic network, phylogeny, Program SHRUB, reconstruction. Note: http://dx.doi.org/10.1089/cmb.2007.0096.
Toggle abstract
"Meiotic recombination is a fundamental biological event and one of the principal evolutionary forces responsible for shaping genetic variation within species. In addition to its fundamental role, recombination is central to several critical applied problems. The most important example is "association mapping" in populations, which is widely hoped to help find genes that influence genetic diseases (Carlson et al., 2004; Clark, 2003). Hence, a great deal of recent attention has focused on problems of inferring the historical derivation of sequences in populations when both mutations and recombinations have occurred. In the algorithms literature, most of that recent work has been directed to singlecrossover recombination. However, geneconversion is an important, and more common, form of (twocrossover) recombination which has been much less investigated in the algorithms literature. In this paper, we explicitly incorporate geneconversion into discrete methods to study historical recombination. We are concerned with algorithms for identifying and locating the extent of historical crossingover and geneconversion (along with singlenucleotide mutation), and problems of constructing full putative histories of those events. The novel technical issues concern the incorporation of geneconversion into recently developed discrete methods (Myers and Griffiths, 2003; Song et al., 2005) that compute lower and upperbound information on the amount of needed recombination without geneconversion. We first examine the most natural extension of the lower bound methods from Myers and Griffiths (2003), showing that the extension can be computed efficiently, but that this extension can only yield weak lower bounds. We then develop additional ideas that lead to higher lower bounds, and show how to solve, via integerlinear programming, a more biologically realistic version of the lower bound problem. We also show how to compute effective upper bounds on the number of needed singlecrossovers and geneconversions, along with explicit networks showing a putative history of mutations, singlecrossovers and geneconversions. Both lower and upper bound methods can handle data with missing entries, and the upper bound method can be used to infer missing entries with high accuracy. We validate the significance of these methods by showing that they can be effectively used to distinguish simulationderived sequences generated without geneconversion from sequences that were generated with geneconversion. We apply the methods to recently studied sequences of Arabidopsis thaliana, identifying many more regions in the sequences than were previously identified (Plagnol et al., 2006), where geneconversion may have played a significant role. Demonstration software is available at www.csif.cs.ucdavis.edu/∼gusfield. © 2007 Mary Ann Liebert, Inc."





Patricia Buendia and
Giri Narasimhan. Sliding MinPD: Building evolutionary networks of serial samples via an automated recombination detection approach. In BIO, Vol. 23(22):29933000, 2007. Keywords: from sequences, phylogenetic network, phylogeny, Program Sliding MinPD, recombination, recombination detection, serial evolutionary networks, software. Note: http://dx.doi.org/10.1093/bioinformatics/btm413.
Toggle abstract
"Motivation: Traditional phylogenetic methods assume treelike evolutionary models and are likely to perform poorly when provided with sequence data from fastevolving, recombining viruses. Furthermore, these methods assume that all the sequence data are from contemporaneous taxa, which is not valid for seriallysampled data. A more general approach is proposed here, referred to as the Sliding MinPD method, that reconstructs evolutionary networks for seriallysampled sequences in the presence of recombination. Results: Sliding MinPD combines distancebased phylogenetic methods with automated recombination detection based on the bestknown sliding window approaches to reconstruct serial evolutionary networks. Its performance was evaluated through comprehensive simulation studies and was also applied to a set of seriallysampled HIV sequences from a single patient. The resulting network organizations reveal unique patterns of viral evolution and may help explain the emergence of diseaseassociated mutants and drugresistant strains with implications for patient prognosis and treatment strategies. © The Author 2007. Published by Oxford University Press. All rights reserved."



Supriya Munshaw and
Thomas B. Kepler. An InformationTheoretic Method for the Treatment of Plural Ancestry in Phylogenetics. In MBE, Vol. 25(6):11991208, 2008. Keywords: explicit network, from sequences, heuristic, phylogenetic network, reconstruction, simulated annealing, software. Note: http://dx.doi.org/10.1093/molbev/msn066.
Toggle abstract
"In the presence of recombination and gene conversion, a given genomic segment may inherit information from 2 distinct immediate ancestors. The importance of this type of molecular inheritance has become increasingly clear over the years, and the potential for erroneous inference when it is not accounted for in the statistical model is well documented. Yet, the inclusion of plural ancestry (PA) in phylogenetic analysis is still not routine. This omission is due to the greater difficulty of phylogenetic inference on general acyclic graphs compared that on with trees and the accompanying computational burden. We have developed a technique for phylogenetic inference in the presence of PA based on the principle of minimum description length, which assigns a cost  the description length  to each network topology given the observed sequence data. The description length combines the cost of poor data fit and model complexity in terms of information. This device allows us to search through network topologies to minimize the total description length. By comparing the best models obtained with and without PA, one can determine whether or not recombination has played an active role in the evolution of the genes under investigation, identify those genes that appear to be mosaic, and infer the phylogenetic network that best represents the history of the alignment. We show that the method performs well on simulated data and demonstrate its application on HIV env gene sequence data from 8 human subjects. The software implementation of the method is available upon request. © The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved."



Galina Glazko,
Vladimir Makarenkov,
Jing Liu and
Arcady Mushegian. Evolutionary history of bacteriophages with doublestranded DNA genomes. In Biology Direct, Vol. 2(36), 2007. Keywords: explicit network, from sequences, phylogenetic network, phylogeny, Program T REX. Note: http://dx.doi.org/10.1186/17456150236.
Toggle abstract
"Background: Reconstruction of evolutionary history of bacteriophages is a difficult problem because of fast sequence drift and lack of omnipresent genes in phage genomes. Moreover, losses and recombinational exchanges of genes are so pervasive in phages that the plausibility of phylogenetic inference in phage kingdom has been questioned. Results: We compiled the profiles of presence and absence of 803 orthologous genes in 158 completely sequenced phages with doublestranded DNA genomes and used these gene content vectors to infer the evolutionary history of phages. There were 18 wellsupported clades, mostly corresponding to accepted genera, but in some cases appearing to define new taxonomic groups. Conflicts between this phylogeny and trees constructed from sequence alignments of phage proteins were exploited to infer 294 specific acts of intergenome gene transfer. Conclusion: A notoriously reticulate evolutionary history of fastevolving phages can be reconstructed in considerable detail by quantitative comparative genomics. © 2007 Glazko et al; licensee BioMed Central Ltd."



Bin Ma,
Lusheng Wang and
Ming Li. Fixed topology alignment with recombination. In DAM, Vol. 104:281300, 2000. Keywords: approximation, explicit network, from network, from sequences, galled tree, inapproximability, phylogenetic network, phylogeny, recombination. Note: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.7759.
Toggle abstract
"Background: Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 4050.Results: Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations.Conclusions: Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spinoff from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):16351656, TCBB 10(1):1825, SIDMA 28(1):4966) and are publicly available. We also apply our methods to real data. © 2014 van Iersel et al.; licensee BioMed Central Ltd."



Sagi Snir and
Tamir Tuller. The NETHMM approach: Phylogenetic Network Inference by Combining Maximum Likelihood and Hidden Markov Models. In JBCB, Vol. 7(4):625644, 2009. Keywords: explicit network, from sequences, HMM, lateral gene transfer, likelihood, phylogenetic network, phylogeny, statistical model. Note: http://research.haifa.ac.il/~ssagi/published%20papers/SnirNETHMMJBCB2009.pdf.
Toggle abstract
"Horizontal gene transfer (HGT) is the event of transferring genetic material from one lineage in the evolutionary tree to a different lineage. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Although the prevailing assumption is of complete HGT, cases of partial HGT (which are also named chimeric HGT) where only part of a gene is horizontally transferred, have also been reported, albeit less frequently. In this work we suggest a new probabilistic model, the NETHMM, for analyzing and modeling phylogenetic networks. This new model captures the biologically realistic assumption that neighboring sites of DNA or amino acid sequences are not independent, which increases the accuracy of the inference. The model describes the phylogenetic network as a Hidden Markov Model (HMM), where each hidden state is related to one of the network's trees. One of the advantages of the NETHMM is its ability to infer partial HGT as well as complete HGT. We describe the properties of the NETHMM, devise efficient algorithms for solving a set of problems related to it, and implement them in software. We also provide a novel complementary significance test for evaluating the fitness of a model (NETHMM) to a given dataset. Using NETHMM, we are able to answer interesting biological questions, such as inferring the length of partial HGT's and the affected nucleotides in the genomic sequences, as well as inferring the exact location of HGT events along the tree branches. These advantages are demonstrated through the analysis of synthetical inputs and three different biological inputs. © 2009 Imperial College Press."



Sarah C. Ayling and
Terence A. Brown. Novel methodology for construction and pruning of quasimedian networks. In BMCB, Vol. 9:115, 2009. Keywords: abstract network, from sequences, median network, phylogenetic network, phylogeny, quasimedian network, reconstruction. Note: http://dx.doi.org/10.1186/147121059115.
Toggle abstract
"BACKGROUND: Visualising the evolutionary history of a set of sequences is a challenge for molecular phylogenetics. One approach is to use undirected graphs, such as median networks, to visualise phylogenies where reticulate relationships such as recombination or homoplasy are displayed as cycles. Median networks contain binary representations of sequences as nodes, with edges connecting those sequences differing at one character; hypothetical ancestral nodes are invoked to generate a connected network which contains all most parsimonious trees. Quasimedian networks are a generalisation of median networks which are not restricted to binary data, although phylogenetic information contained within the multistate positions can be lost during the preprocessing of data. Where the history of a set of samples contain frequent homoplasies or recombination events quasimedian networks will have a complex topology. Graph reduction or pruning methods have been used to reduce network complexity but some of these methods are inapplicable to datasets in which recombination has occurred and others are procedurally complex and/or result in disconnected networks. RESULTS: We address the problems inherent in construction and reduction of quasimedian networks. We describe a novel method of generating quasimedian networks that uses all characters, both binary and multistate, without imposing an arbitrary ordering of the multistate partitions. We also describe a pruning mechanism which maintains at least one shortest path between observed sequences, displaying the underlying relations between all pairs of sequences while maintaining a connected graph. CONCLUSION: Application of this approach to 5S rDNA sequence data from sea beet produced a pruned network within which genetic isolation between populations by distance was evident, demonstrating the value of this approach for exploration of evolutionary relationships."



Tal Dagan,
Yael ArtzyRandrup and
William Martin. Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. In PNAS, Vol. 105:1003910044, 2008. Keywords: from sequences, from species tree, heuristic, lateral gene transfer, phylogenetic network, phylogeny, reconstruction. Note: http://dx.doi.org/10.1073/pnas.0800679105.
Toggle abstract
"Lateral gene transfer is an important mechanism of natural variation among prokaryotes, but the significance of its quantitative contribution to genome evolution is debated. Here, we report networks that capture both vertical and lateral components of evolutionary history among 539,723 genes distributed across 181 sequenced prokaryotic genomes. Partitioning of these networks by an eigenspectrum analysis identifies community structure in prokaryotic genesharing networks, the modules of which do not correspond to a strictly hierarchical prokaryotic classification. Our results indicate that, on average, at least 81 ± 15% of the genes in each genome studied were involved in lateral gene transfer at some point in their history, even though they can be vertically inherited after acquisition, uncovering a substantial cumulative effect of lateral gene transfer on longer evolutionary time scales. © 2008 by The National Academy of Sciences of the USA."



HansJürgen Bandelt and
Arne Dür. Translating DNA data tables into quasimedian networks for parsimony analysis and error detection. In MPE, Vol. 42(1):256271, 2007. Keywords: abstract network, from sequences, parsimony, phylogenetic network, phylogeny, quasimedian network, reconstruction. Note: http://dx.doi.org/10.1016/j.ympev.2006.07.013.
Toggle abstract
"Every DNA data table can be turned into a quasimedian network that faithfully represents the data. We show that for (weighted) condensed data tables the associated network harbors all most parsimonious reconstructions for any tree that connects the sampled haplotypes. Structural features of this network can be computed directly from the data table. The key principle repeatedly used is that the quasimedian network is uniquely determined by the subtables for pairs of characters. The translation of a table into a network enhances the understanding of the properties of the data in regard to homoplasy and potential artifacts. The total number of nodes of such a network measures the complexity of the data. In particular, networks that display the results of filter analyses by which hotspot mutations are removed help to detect data idiosyncrasies and thus pinpoint sequencing problems. A pertinent example drawn from human mtDNA illustrates these points. © 2006 Elsevier Inc. All rights reserved."



Leo van Iersel and
Steven Kelk. When two trees go to war. In JTB, Vol. 269(1):245255, 2011. Keywords: APX hard, explicit network, from clusters, from rooted trees, from sequences, from triplets, level k phylogenetic network, minimum number, NP complete, phylogenetic network, phylogeny, polynomial, reconstruction. Note: http://arxiv.org/abs/1004.5332.
Toggle abstract
"Rooted phylogenetic networks are used to model nontreelike evolutionary histories. Such networks are often constructed by combining trees, clusters, triplets or characters into a single network that in some welldefined sense simultaneously represents them all. We review these four models and investigate how they are related. Motivated by the parsimony principle, one often aims to construct a network that contains as few reticulations (nontreelike evolutionary events) as possible. In general, the model chosen influences the minimum number of reticulation events required. However, when one obtains the input data from two binary (i.e. fully resolved) trees, we show that the minimum number of reticulations is independent of the model. The number of reticulations necessary to represent the trees, triplets, clusters (in the softwired sense) and characters (with unrestricted multiple crossover recombination) are all equal. Furthermore, we show that these results also hold when not the number of reticulations but the level of the constructed network is minimised. We use these unification results to settle several computational complexity questions that have been open in the field for some time. We also give explicit examples to show that already for data obtained from three binary trees the models begin to diverge. © 2010 Elsevier Ltd."



Hyun Jung Park,
Guohua Jin and
Luay Nakhleh. Bootstrapbased Support of HGT Inferred by Maximum Parsimony. In BMCEB, Vol. 10:131, 2010. Keywords: bootstrap, explicit network, from sequences, lateral gene transfer, parsimony, phylogenetic network, phylogeny, Program Nepal, reconstruction. Note: http://dx.doi.org/10.1186/1471214810131.
Toggle abstract
"Background. Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and coworkers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold. Results. In this paper, we address this problem in a more systematic way, by proposing a nonparametric bootstrapbased measure of support of inferred reticulation events, and using it to determine the number of those events, as well as their placements. A number of samples is generated from the given sequence alignment, and reticulation events are inferred based on each sample. Finally, the support of each reticulation event is quantified based on the inferences made over all samples. Conclusions. We have implemented our method in the NEPAL software tool (available publicly at http://bioinfo.cs.rice.edu/), and studied its performance on both biological and simulated data sets. While our studies show very promising results, they also highlight issues that are inherently challenging when applying the maximum parsimony criterion to detect reticulate evolution. © 2010 Park et al; licensee BioMed Central Ltd."





Marta Melé,
Asif Javed,
Marc Pybus,
Francesc Calafell,
Laxmi Parida,
Jaume Bertranpetit and
Genographic Consortium. A New Method to Reconstruct Recombination Events at a Genomic Scale. In PLoS Computational Biology, Vol. 6(11):e1001010, 2010. Keywords: explicit network, from sequences, phylogenetic network, phylogeny. Note: http://dx.doi.org/10.1371/journal.pcbi.1001010.
Toggle abstract
"Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have finetuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 Xchromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation. © 2010 Mele ́ et al."



Sagi Snir and
Edward Trifonov. A Novel Technique for Detecting Putative Horizontal Gene Transfer in the Sequence Space. In JCB, Vol. 17(11):15351548, 2010. Keywords: from sequences, phylogenetic network, phylogeny, reconstruction. Note: http://research.haifa.ac.il/~ssagi/published%20papers/JCBHGT.pdf.
Toggle abstract
"Horizontal transfer (HT) is the event of a DNA sequence being transferred between species not by inheritance. This phenomenon violates the treelike evolution of the species under study turning the trees into networks. At the sequence level, HT offers basic characteristics that enable not only clear identification and distinguishing from other sequence similarity cases but also the possibility of dating the events. We developed a novel, selfcontained technique to identify relatively recent horizontal transfer elements (HTEs) in the sequences. Appropriate formalism allows one to obtain confidence values for the events detected. The technique does not rely on such problematic prerequisites as reliable phylogeny and/or statistically justified pairwise sequence alignment. In conjunction with the unique properties of HT, it gives rise to a twolevel sequence similarity algorithm that, to the best of our knowledge, has not been explored. From evolutionary perspective, the novelty of the work is in the combination of small scale and large scale mutational events. The technique is employed on both simulated and real biological data. The simulation results show high capability of discriminating between HT and conserved regions. On the biological data, the method detected documented HTEs along with their exact locations in the recipient genomes. Supplementary Material is available online at www.libertonline.com/cmb. Copyright 2010, Mary Ann Liebert, Inc."



Alix Boc and
Vladimir Makarenkov. Towards an accurate identification of mosaic genes and partial horizontal gene transfers. In NAR, Vol. 39(21):e144, 2011. Keywords: explicit network, from sequences, lateral gene transfer, phylogenetic network, phylogeny, Program T REX, reconstruction. Note: http://dx.doi.org/10.1093/nar/gkr735.
Toggle abstract
"Many bacteria and viruses adapt to varying environmental conditions through the acquisition of mosaic genes. A mosaic gene is composed of alternating sequence polymorphisms either belonging to the host original allele or derived from the integrated donor DNA. Often, the integrated sequence contains a selectable genetic marker (e.g. marker allowing for antibiotic resistance). An effective identification of mosaic genes and detection of corresponding partial horizontal gene transfers (HGTs) are among the most important challenges posed by evolutionary biology. We developed a method for detecting partial HGT events and related intragenic recombination giving rise to the formation of mosaic genes. A bootstrap procedure incorporated in our method is used to assess the support of each predicted partial gene transfer. The proposed method can be also applied to confirm or discard complete (i.e. traditional) horizontal gene transfers detected by any HGT inferring method. While working on a fullgenome scale, the new method can be used to assess the level of mosaicism in the considered genomes as well as the rates of complete and partial HGT underlying their evolution. © 2011 The Author(s)."



Lavanya Kannan and
Ward C Wheeler. Maximum Parsimony on Phylogenetic Networks. In ALMOB, Vol. 7:9, 2012. Keywords: dynamic programming, explicit network, from sequences, heuristic, parsimony, phylogenetic network, phylogeny. Note: http://dx.doi.org/10.1186/1748718879.
Toggle abstract
"Background: Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a characterbased approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.Results: In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain wellknown algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.Conclusion: The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an inbuilt cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network. © 2012 Kannan and Wheeler; licensee BioMed Central Ltd."





Mareike Fischer,
Leo van Iersel,
Steven Kelk and
Celine Scornavacca. On Computing The Maximum Parsimony Score Of A Phylogenetic Network. In SIDMA, Vol. 29(1):559585, 2015. Keywords: APX hard, cluster containment, explicit network, FPT, from network, from sequences, integer linear programming, level k phylogenetic network, NP complete, parsimony, phylogenetic network, phylogeny, polynomial, Program MPNet, reconstruction, software. Note: http://arxiv.org/abs/1302.2430.



Lavanya Kannan and
Ward C Wheeler. Exactly Computing the Parsimony Scores on Phylogenetic Networks Using Dynamic Programming. In JCB, Vol. 21(4):303319, 2014. Keywords: explicit network, exponential algorithm, from network, from sequences, parsimony, phylogenetic network, phylogeny, reconstruction.
Toggle abstract
"Scoring a given phylogenetic network is the first step that is required in searching for the best evolutionary framework for a given dataset. Using the principle of maximum parsimony, we can score phylogenetic networks based on the minimum number of state changes across a subset of edges of the network for each character that are required for a given set of characters to realize the input states at the leaves of the networks. Two such subsets of edges of networks are interesting in light of studying evolutionary histories of datasets: (i) the set of all edges of the network, and (ii) the set of all edges of a spanning tree that minimizes the score. The problems of finding the parsimony scores under these two criteria define slightly different mathematical problems that are both NPhard. In this article, we show that both problems, with scores generalized to adding substitution costs between states on the endpoints of the edges, can be solved exactly using dynamic programming. We show that our algorithms require O(mpk) storage at each vertex (per character), where k is the number of states the character can take, p is the number of reticulate vertices in the network, m = k for the problem with edge set (i), and m = 2 for the problem with edge set (ii). This establishes an O(nmpk2) algorithm for both the problems (n is the number of leaves in the network), which are extensions of Sankoff's algorithm for finding the parsimony scores for phylogenetic trees. We will discuss improvements in the complexities and show that for phylogenetic networks whose underlying undirected graphs have disjoint cycles, the storage at each vertex can be reduced to O(mk), thus making the algorithm polynomial for this class of networks. We will present some properties of the two approaches and guidance on choosing between the criteria, as well as traverse through the network space using either of the definitions. We show that our methodology provides an effective means to study a wide variety of datasets. © Copyright 2014, Mary Ann Liebert, Inc. 2014."



Joel Sjöstrand,
Ali Tofigh,
Vincent Daubin,
Lars Arvestad,
Bengt Sennblad and
Jens Lagergren. A Bayesian Method for Analyzing Lateral Gene Transfer. In Systematic Biology, Vol. 63(3):409420, 2014. Keywords: bayesian, duplication, from rooted trees, from sequences, from species tree, lateral gene transfer, loss, phylogenetic network, phylogeny, Program JPrIMEDLTRS, reconstruction. Note: http://dx.doi.org/10.1093/sysbio/syu007.







Gergely J. Szöllösi,
Adrián Arellano Davín,
Eric Tannier,
Vincent Daubin and
Bastien Boussau. Genomescale phylogenetic analysis finds extensive gene transfer among fungi. In Philosophical Transactions of the Royal Society of London B: Biological Sciences, Vol. 370(1678):111, 2015. Keywords: duplication, from sequences, lateral gene transfer, loss, phylogenetic network, phylogeny, Program ALE, reconstruction. Note: http://dx.doi.org/10.1098/rstb.2014.0335.



Jessica W. Leigh and
David Bryant. PopART: fullfeature software for haplotype network construction. In MEE, Vol. 6(9):1110–1116, 2015. Keywords: abstract network, from sequences, haplotype network, MedianJoining, phylogenetic network, phylogeny, population genetics, Program PopART, Program TCS, software. Note: http://dx.doi.org/10.1111/2041210X.12410.





Hussein A. Hejase and
Kevin J. Liu. A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation. Vol. 17(422):112, 2016. Keywords: abstract network, evaluation, from sequences, phylogenetic network, phylogeny, Program PhyloNet, Program PhyloNetworks SNaQ, reconstruction, simulation, unicyclic network. Note: http://dx.doi.org/10.1186/s1285901612771.











Daniel H. Huson and
Tobias Kloepper. Computing recombination networks from binary sequences. In ECCB05, Vol. 21(suppl. 2):ii159ii165 of BIO, 2005. Keywords: from sequences, phylogenetic network, phylogeny, recombination. Note: http://dx.doi.org/10.1093/bioinformatics/bti1126.
Toggle abstract
"Motivation:Phylogenetic networks are becoming an important tool in molecular evolution, as the evolutionary role of reticulate events, such as hybridization, horizontal gene transfer and recombination, is becoming more evident, and as the available data is dramatically increasing in quantity and quality. Results: This paper addresses the problem of computing a most parsimonious recombination network for an alignment of binary sequences that are assumed to have arisen under the 'infinite sites' model of evolution with recombinations. Using the concept of a splits network as the underlying datastructure, this paper shows how a recent method designed for the computation of hybridization networks can be extended to also compute recombination networks. A robust implementation of the approach is provided and is illustrated using a number of real biological datasets. © The Author 2005. Published by Oxford University Press. All rights reserved."





Rune Lyngsø,
Yun S. Song and
Jotun Hein. Minimum Recombination Histories by Branch and Bound. In WABI05, Vol. 3692:239250 of LNCS, springer, 2005. Keywords: ARG, branch and bound, from sequences, minimum number, Program Beagle, recombination, reconstruction, software. Note: http://www.cs.ucdavis.edu/~yssong/Pub/WABI05239.pdf.



Lusheng Wang,
Kaizhong Zhang and
Louxin Zhang. Perfect phylogenetic networks with recombination. In SAC01, Pages 4650, 2001. Keywords: from sequences, galled tree, NP complete, perfect, phylogenetic network, phylogeny, polynomial, recombination, reconstruction. Note: http://dx.doi.org/10.1145/372202.372271.



Sagi Snir and
Tamir Tuller. Novel Phylogenetic Network Inference by Combining Maximum Likelihood and Hidden Markov Models. In WABI08, Vol. 5251:354368 of LNCS, springer, 2008. Keywords: explicit network, from sequences, HMM, lateral gene transfer, likelihood, phylogenetic network, phylogeny, statistical model. Note: http://dx.doi.org/10.1007/9783540873617_30.
Toggle abstract
"Horizontal Gene Transfer (HGT) is the event of transferring genetic material from one lineage in the evolutionary tree to a different lineage. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Although the prevailing assumption is of complete HGT, cases of partial HGT (which are also named chimeric HGT) where only part of a gene is horizontally transferred, have also been reported, albeit less frequently. In this work we suggest a new probabilistic model for analyzing and modeling phylogenetic networks, the NETHMM. This new model captures the biologically realistic assumption that neighboring sites of DNA or amino acid sequences are not independent, which increases the accuracy of the inference. The model describes the phylogenetic network as a Hidden Markov Model (HMM), where each hidden state is related to one of the network's trees. One of the advantages of the NETHMM is its ability to infer partial HGT as well as complete HGT. We describe the properties of the NETHMM, devise efficient algorithms for solving a set of problems related to it, and implement them in software. We also provide a novel complementary significance test for evaluating the fitness of a model (NETHMM) to a given data set. Using NETHMM we are able to answer interesting biological questions, such as inferring the length of partial HGT's and the affected nucleotides in the genomic sequences, as well as inferring the exact location of HGT events along the tree branches. These advantages are demonstrated through the analysis of synthetical inputs and two different biological inputs. © 2008 SpringerVerlag Berlin Heidelberg."



Ernst Althaus and
Rouven Naujoks. Reconstructing Phylogenetic Networks with One Recombination. In Proceedings of the seventh International Workshop on Experimental Algorithms (WEA'08), Vol. 5038:275288 of LNCS, springer, 2008. Keywords: enumeration, explicit network, exponential algorithm, from sequences, generation, parsimony, phylogenetic network, phylogeny, reconstruction, unicyclic network. Note: http://dx.doi.org/10.1007/9783540685524_21.
Toggle abstract
"In this paper we propose a new method for reconstructing phylogenetic networks under the assumption that recombination events have occurred rarely. For a fixed number of recombinations, we give a generalization of the maximum parsimony criterion. Furthermore, we describe an exact algorithm for one recombination event and show that in this case our method is not only able to identify the recombined sequence but also to reliably reconstruct the complete evolutionary history. © 2008 SpringerVerlag Berlin Heidelberg."



Cuong Than,
Guohua Jin and
Luay Nakhleh. Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer. In Proceedings of the Sixth RECOMB Comparative Genomics Satellite Workshop (RECOMBCG'08), Vol. 5267:113127 of LNCS, springer, 2008. Keywords: bootstrap, explicit network, from rooted trees, from sequences, lateral gene transfer, phylogenetic network, phylogeny, Program Nepal, Program PhyloNet, reconstruction. Note: http://www.cs.rice.edu/~nakhleh/Papers/recombcg08.pdf, slides available at http://igm.univmlv.fr/RCG08/RCG08slides/Cuong_Than_RCG08.pdf.



Bin Ma,
Lusheng Wang and
Ming Li. Fixed topology alignment with recombination. In CPM98, Vol. 1448:174188 of LNCS, springer, 1998. Keywords: approximation, explicit network, from network, from sequences, galled tree, inapproximability, phylogenetic network, phylogeny, recombination. Note: http://dx.doi.org/10.1007/BFb0030789.



Hadas Birin,
Zohar GalOr,
Isaac Elias and
Tamir Tuller. Inferring Models of Rearrangements, Recombinations, and Horizontal Transfers by the Minimum Evolution Criterion. In WABI07, Vol. 4645:111123 of LNCS, springer, 2007. Keywords: explicit network, from sequences, phylogenetic network, phylogeny, reconstruction. Note: http://safrabio.cs.tau.ac.il/download/Papers/Birin_et_al.pdf.





Cayla McBee. Generalizing Fourier Calculus on Evolutionary Trees to Splits Networks. In ISPAN'12, Pages 149155, 2012. Keywords: abstract network, from sequences, phylogenetic network, phylogeny, split network, statistical model.
Toggle abstract
"Biologists have been interested in Phylogenetics, the study of evolutionary relatedness among various groups of organisms, for more than 140 years. In spite of this, it has only been in the last 40 years that advances in technology and the availability of DNA sequences have led to statistical, computational and algorithmic work on determining evolutionary relatedness between organisms. One method of determining historical relationships between organisms is to assume a group based evolutionary model and use a discrete Fourier transform. The 1993 paper 'Fourier Calculus on Evolutionary Trees' by L.A. Szekely, M.A. Steel and P.L. Erdos outlines this process. The transform presented in Szekely et al provides an invertible relationship between phylogenetic trees and expected frequencies of nucleotide patterns in nucleotide sequences. This implies that given a set of nucleotide sequences from various organisms it is possible to construct a phylogenetic tree that represents the historical relationships of those organisms. Some scenarios are poorly described by phylogenetic trees and there are biological and statistical reasons for using networks to model phylogenetic relationships. Given this motivation I have generalized Szekely et al's result to apply to a specific type of phylogenetic network known as a splits network. © 2012 IEEE."



Quan Nguyen and
Teemu Roos. Likelihoodbased inference of phylogenetic networks from sequence data by PhyloDAG. In ALCOB15, Vol. 9199:126140 of LNCS, springer, 2015. Keywords: BIC, explicit network, from sequences, likelihood, phylogenetic network, phylogeny, Program PhyloDAG, reconstruction, software. Note: http://www.cs.helsinki.fi/u/ttonteri/pub/alcob2015.pdf.



