|
|
||||||||
1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
2 Graduate Program in Pharmacology, Columbia University, New York, New York 10032, USA
Reprint requests to: Burkhard Rost, CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168 Street, BB217, New York, New York 10032, USA; e-mail: rost{at}columbia.edu fax: (212) 305-7932.
More than 30 organisms have been sequenced entirely. Here, we applied a variety of simple bioinformatics tools to analyze 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes, and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar among the three. We predicted that
15%30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with seven transmembrane helices in eukaryotes and more with six and 12 transmembrane helices in prokaryotes. We found twice as many coiled-coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4%5%), and we predicted
15%25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homolog in current databases, and 30%40% of the proteins fell into structural families with >100 members. A classification by cellular function verified that eukaryotes have a higher proportion of proteins for communication with the environment. Finally, we found at least one homolog of experimentally known structure for
20%45% of all proteins; the regions with structural homology covered 20%30% of all residues. These numbers may or may not suggest that there are 12002600 folds in the universe of protein structures. All predictions are available at http://cubic.bioc.columbia.edu/genomes.
Keywords: Protein sequence analysis; analyzing entire genomes; helical membrane proteins; coiled-coil proteins; signal peptides; comparative modeling
Abbreviations: 3D structure, three-dimensional structure (i.e., coordinates of all residues/atoms in a protein) COILS, prediction of coiled-coil regions from sequence based on statistics and expert rules ORF, open reading frame (protein predicted by genome-sequencing project) PDB, protein data bank of protein structures PHDhtm, profile-based neural network prediction of transmembrane helices PSI-BLAST, fast and reliable database search method SignalP, neural network based prediction of signal peptides SWISS-PROT, curated database with protein sequences and functional annotations TM, transmembrane helices TrEMBL, automatic translation of EMBL nucleotide database of protein sequences
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
S. Montgomerie, J. A. Cruz, S. Shrivastava, D. Arndt, M. Berjanskii, and D. S. Wishart PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation Nucleic Acids Res., July 1, 2008; 36(suppl_2): W202 - W209. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. D. Sutherland, S. Weisman, H. E. Trueman, A. Sriskantha, J. W. H. Trueman, and V. S. Haritos Conservation of Essential Design Features in Coiled Coil Silks Mol. Biol. Evol., November 1, 2007; 24(11): 2424 - 2432. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ofran, V. Mysore, and B. Rost Prediction of DNA-binding residues from sequence Bioinformatics, July 1, 2007; 23(13): i347 - i353. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Carroll and V. Pavlovic Protein classification using probabilistic chain graphs and the Gene Ontology structure Bioinformatics, August 1, 2006; 22(15): 1871 - 1878. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zoonens, L. J. Catoire, F. Giusti, and J.-L. Popot From the Cover: NMR study of a membrane protein in detergent-free aqueous solution PNAS, June 21, 2005; 102(25): 8893 - 8898. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Y. Kahsay, G. Gao, and L. Liao An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes Bioinformatics, May 1, 2005; 21(9): 1853 - 1858. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Chance, A. Fiser, A. Sali, U. Pieper, N. Eswar, G. Xu, J. E. Fajardo, T. Radhakannan, and N. Marinkovic High-Throughput Computational and Experimental Techniques in Structural Genomics Genome Res., October 1, 2004; 14(10b): 2145 - 2154. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Stockner, W. L. Ash, J. L. MacCallum, and D. P. Tieleman Direct Simulation of Transmembrane Helix Association: Role of Asparagines Biophys. J., September 1, 2004; 87(3): 1650 - 1656. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Arai, K. Okumura, M. Satake, and T. Shimizu Proteome-wide functional classification and identification of prokaryotic transmembrane proteins by transmembrane topology similarity comparison Protein Sci., August 1, 2004; 13(8): 2170 - 2183. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. G. Knight, R. Kassen, H. Hebestreit, and P. B. Rainey From The Cover: Global analysis of predicted proteomes: Functional adaptation of physical properties PNAS, June 1, 2004; 101(22): 8390 - 8395. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Rose, S. Manikantan, S. J. Schraegle, M. A. Maloy, E. A. Stahlberg, and I. Meier Genome-Wide Identification of Arabidopsis Coiled-Coil Proteins and Establishment of the ARABI-COIL Database Plant Physiology, March 1, 2004; 134(3): 927 - 939. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Dosztanyi, C. Magyar, G. E. Tusnady, M. Cserzo, A. Fiser, and I. Simon Servers for sequence-structure relationship analysis and prediction Nucleic Acids Res., July 1, 2003; 31(13): 3359 - 3363. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Sugiyama, N. Polulyakh, and T. Shimizu Identification of transmembrane protein functions by binary topology patterns Protein Eng. Des. Sel., July 1, 2003; 16(7): 479 - 488. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kanapin, S. Batalov, M. J. Davis, J. Gough, S. Grimmond, H. Kawaji, M. Magrane, H. Matsuda, C. Schonbach, R. D. Teasdale, et al. Mouse Proteome Analysis Genome Res., June 1, 2003; 13(6): 1335 - 1344. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Casadio, P. Fariselli, G. Finocchiaro, and P. L. Martelli Fishing new proteins in the twilight zone of genomes: The test case of outer membrane proteins in Escherichia coli K12, Escherichia coli O157:H7, and other Gram-negative bacteria Protein Sci., June 1, 2003; 12(6): 1158 - 1168. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ikeda, M. Arai, T. Okuno, and T. Shimizu TMPDB: a database of experimentally-characterized transmembrane topologies Nucleic Acids Res., January 1, 2003; 31(1): 406 - 409. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Carter, J. Liu, and B. Rost PEP: Predictions for Entire Proteomes Nucleic Acids Res., January 1, 2003; 31(1): 410 - 413. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. P. Chen and B. Rost Long membrane helices and short loops predicted less accurately Protein Sci., December 1, 2002; 11(12): 2766 - 2773. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. P. Chen, A. Kernytsky, and B. Rost Transmembrane helix predictions revisited Protein Sci., December 1, 2002; 11(12): 2774 - 2791. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost Sequence conserved for subcellular localization Protein Sci., December 1, 2002; 11(12): 2836 - 2847. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Litowski and R. S. Hodges Designing Heterodimeric Two-stranded alpha -Helical Coiled-coils. EFFECTS OF HYDROPHOBICITY AND alpha -HELICAL PROPENSITY ON PROTEIN FOLDING, STABILITY, AND SPECIFICITY J. Biol. Chem., September 27, 2002; 277(40): 37272 - 37279. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. G. Fleming and D. M. Engelman Specificity in transmembrane helix-helix interactions can define a hierarchy of stability for sequence variants PNAS, December 4, 2001; 98(25): 14340 - 14344. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |