Prepare Your Taste Buds

Search databasePMCAll DatabasesAssemblyBiocollectionsBioProjectBioSampleBioSystemsBooksClinVarConserved DomainsdbGaPdbVarGeneGenomeGEO DataSetsGEO ProfilesGTRHomoloGeneIdentical Protein GroupsMedGenMeSHtravelhome.vn Web Sitetravelhome.vn CatalogNucleotideOMIMPMCPopSetProteinProtein ClustersProtein Family ModelsPubChem BioAssayPubChem CompoundPubChem SubstancePubMedSNPSRAStructureTaxonomyToolKitToolKitAllToolKitBookgh

*

T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension
Paolo Di Tommaso,1 Sebastien Moretti,2,3 Ioannis Xenarios,2 Miquel Orobitg,4 Alberto Montanyola,4 Jia-Ming Chang,1 Jean-François Taly,1 and Cedric Notredame1,*

Paolo Di Tommaso

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

Sebastien Moretti

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

Ioannis Xenarios

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

Miquel Orobitg

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

Alberto Montanyola

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

Jia-Ming Chang

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

Jean-François Taly

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

Find articles by Jean-François Taly

Cedric Notredame

1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain

XEM THÊM:  Liên Hoa Đà Lạt - Tiệm Bánh Liên Hoa

Find articles by Cedric Notredame
1Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain, 2Vital-IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3Department of Ecology and Evolution, Biophore, Lausanne University, CH-1015 Lausanne, Switzerland and 4Department of Computer Science and Industrial Engineering, University of Lleida, Campus de Cappont, C. de Jaume II 69, E-25001 Lleida, Spain
emaderton.cirdec
Received 2011 Feb 18; Revised 2011 Mar 23; Accepted 2011 Apr 5.

Đang xem: Prepare your taste buds

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article has been cited by other articles in PMC.

Abstract

This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.

INTRODUCTION

As judged by citation index, multiple sequence alignment (MSA) is one of the most widely used techniques in biology. Indeed the multiple comparisons of homologous sequences has applications in almost all fields of modern biology, from simple data monitoring up to sophisticated modeling-like structure prediction and phylogenetic reconstruction. In the past 20 years, more than 50 aligners have been published (1), and a wide diversity of choices that mostly reflects the lack of a universal method solving unambiguously the multiple sequence alignment problem. It is indeed a complex task that stands at the interface between computer science and biology. The biological problem is the definition of a mathematical formula (objective function) accurately quantifying the biological relationship between two sequences on the basis of their alignment. The computational problem is the estimation of an optimal model with respect to the objective function. In practice the objective functions described so far have difficulties accurately modeling the homology between protein sequences having <30% identity (70% in the case of nucleic acids). Yet, these functions are not only limited in accuracy but they are also difficult to optimize and it has been shown that, for the most commonly used functions, the computation of an optimal multiple sequence alignment is an NP-complete problem (2). The lack of an exact solution has prompted the development of a large number of heuristic solutions, either focused on the design of novel objective functions (3,4), the improvement of the optimization algorithm (5,6) or a trade-off between accuracy and speed (7).

XEM THÊM:  mặt hoa da phấn gò vấp

Xem thêm: Boathouse Tiki Bar & Grill, Boathouse Rotisserie & Raw Bar

T-Coffee (8) belongs to the class of aligners known as consistency based, which may be described as slow and accurate. These include ProbCons (9), PCMA (10), MAFFT (the slow accurate mode) (11), PROMALS (12) and Pecan (13). All the aligners of this class trade speed for increased precision. Over the years, they have been shown by many independent studies to outperform their simpler counterparts in terms of accuracy. While one may debate over the value of a modest but significantly increased accuracy at a sometimes prohibitive CPU cost, one should not overlook what is probably the main advantage of consistency-based protocols: their integrative capacity. In the consistency-based framework, the considered sequences are not directly integrated in a multiple sequence alignment. They are first aligned using any suitable combination of third-party aligners. The resulting collection of alignments (named a library in T-Coffee) is then turned into a multiple sequence alignment using a position specific scoring scheme derived from the library (consistency-based progressive algorithm). In practice the way the library is computed defines most of the variations around T-Coffee. The first version was using a combination of ClustalW all against all pair-wise alignments combined with Lalign all against all local alignments, the current version uses all against all pair-wise alignments computed with a pair HMM (9). There is no limit on how many methods and what kind of methods may be combined this way. One can even use pre-existing multiple sequence alignment methods, like in the M-Coffee protocol (14) where the library is made of a collection of multiple sequence alignment produced with third-party multiple aligners. This approach is becoming increasingly popular as it makes it possible to compare and combine the output of several aligners therefore simplifying the software selection dilemma. This possibility is important in a period where concerns are growing on the non-neutrality of the alignment methods towards subsequent modeling (15). When combining multiple sequence alignments, one can either combine all existing methods or only a selected subset. For instance, since 2009, the compara component of ENSEMBL uses M-Coffee to combine the output of three fast aligners (MAFFT, MUSCLE and Kalign) in order to produce the MSAs needed for the computation of the reference trees. On the server, users can use check-boxes to select the methods they want to combine. Our aim is to integrate as many public methods as possible and we welcome users requests for unsupported methods. We currently have an interface with eight popular aligners.

The most recent improvement in T-Coffee has been the development of the concept of template-based multiple sequence alignment (16,17). When run as a template-based aligner, T-Coffee uses a different procedure to generate the primary library: rather than directly aligning the sequences, it associates each input sequence with a template, it then aligns every pair of templates with an appropriate aligner and projects the resulting alignments onto the original sequences.

XEM THÊM:  Vụ Trà Sữa Toco Thái Nguyên, Trà Sữa Tocotoco Mây Mưa Ở Thái Nguyên

The new server offers three template-based alignment modes: one for RNA sequences , one for protein with a known structure and one for the alignment of distantly related sequences . R-Coffee uses as templates RNA secondary structure predictions obtained by applying the RNAplfold prediction algorithm onto the considered sequences. The primary library is then produced using any user-defined combination of aligners and extended using the predicted secondary structures. Expresso uses protein data bank (PDB) 3D structures as templates. For each input sequence, putative templates are identified by a BLAST search against the sequences of the PDB and the subsequent selection of the best hit (>30% identity over >50% of the query sequence). The library is then computed by aligning every pair of templates with a structural aligner. SAP (19) is used by default although users have the possibility to select other structural aligners or to combine them. Whenever a sequence lacks a closely related structure, the standard pair-wise sequence alignment procedure (proba_pair) is used for all the pair-wise alignments involving this sequence. Once the library is compiled the alignment is produced using the standard T-Coffee algorithm.

PSI-Coffee is a novel mode of T-Coffee (manuscript in preparation). It uses protein profiles as templates rather than structures and works as follows: each sequence is BLASTed individually against NR database and the resulting BLAST alignments (i.e. one-to-all between each query and its hits) are turned into profiles (sequences with identity <30% or coverage <40% are excluded). Given a set of N sequences, the result is a collection of N profiles each embedding a distinct query sequence. The profiles are then aligned two by two (using the proba_pair pair-HMM) and the resulting alignment for the query sequences is added to the library. The rest of the procedure uses the standard T-Coffee methodology to deliver a consistency-based MSA. The principle of PSI-Coffee is very similar to that of PROMALS (20). Its main advantage is its reliance on homology extension for the computation of the library, a process shown by us and others as a source of improvement for the alignment of remote homologues (1,20). It should be noted that the homologous sequences identified by BLAST are not added to the final MSA. They are only used to increase the accuracy of the underlying alignment.

Xem thêm: la paz resort tuần châu

All these aforementioned alignment procedures are now available via the web-server described in this article. Its main strength is to offer the most sophisticated modes of T-Coffee for the production of highly accurate sequence alignments. Some of these modes integrate complex component such as BLAST database searches and secondary structure predictions, yet thanks to the web server, users do not need to install, maintain or integrate these resources.

5 Quán Chè Dừa Thái Lan Nguyễn Phong Sắc, Chè Dừa Thái Lan
Top 6 Nhà Sách Tiền Phong Hải Phòng, Nhà Sách Tiền Phong, Hải Phòng
Tác giả

Bình luận

LarTheme