|
|
||||||||
Biochemistry and Molecular Biology Department, University College, WC1E 6BT London, UKCrystallography Department, Birkbeck College, WC1E 7HX, London, UK
Reprint requests to: Dr. Janet M. Thornton, Biochemistry and Molecular Biology Department, University College, Gower Street, London WC1E 6BT; e-mail: thornton{at}biochem.ucl.ac.uk; fax: 44-171-380-8499.
(RECEIVED November 8, 2000; ACCEPTED November 8, 2000)
Article and publication are at www.proteinscience.org/cgi/doi/10.1110/ps.90001.
The following article by Janet Thornton, a summary of her Neurath Award lecture at the Protein Science Symposium of August 2000, is the first of a series of invited articles that Protein Science will publish on the subject of proteomics and bioinformatics this year. Our sincere thanks to Professor Thornton and congratulations on her well-deserved award.
| Introduction |
|---|
|
|
|---|
| Chirality |
|---|
|
|
|---|
The first protein structures had revealed that the exclusive use of L-amino acids gives rise to the nonsymmetric Ramachandran plot, right-handed
-helices and left-handed twisted ß-sheets. However, in 1977, inspection of protein folds led several groups to realize that chirality was also apparent at the higher levels of protein structure, leading to right-handed ß
ßunits (Fig. 1A
) in parallel sheets (Sternberg and Thornton 1976; Nagano 1977; Richardson 1981). Our more recent automated analysis of chirality confirms the strength of this effect, with 95% of all ß
ß units in 271 proteins adopting a right-handed orientation and only 1% left-handed (4% being classed as indeterminate; Slidel and Thornton 1995; Fig. 1B
). In turn, chirality at the supersecondary structure level has a major influence on the overall topology of protein structures, generating, for example, right-handed four-helix bundles and left-handed TIM barrels (Fig. 1C
). This cascade, from L-amino acids to complete topologies, reflects the fact that during folding, proteins are governed by the fundamental rules of physics and chemistry. This strength of preference is very rare in proteins, with most rules (e.g., secondary structure propensities) being much less deterministic. However, as is often the case, nature surprised us when the structure of UDP-N-acetylglucosamine acetyltransferase (Raetz and Roderick 1995) proved to be a left-handed parallel ß helix, stabilized by a network of conserved interactions. Even with such strong rules there are exceptions.
|
| Molecular interactions and energetics |
|---|
|
|
|---|
| Motifs |
|---|
|
|
|---|
Although it is now clear that folding is driven more by the burial of hydrophobic groups than the formation of secondary structure units, motifs still provide a useful simplification to describe structures. In addition, they are often found to be the unit of inheritance during evolution, for example, four Greek keys in the
-crystallin structure (Blundell et al. 1981). As a corollary to studying the occurrence and structures of ßß hairpins, we analyzed the conformations of loops to explore whether they adopt preferred conformations in these common motifs. We found very specific short loop conformations in most of the common secondary structure motifs (Sibanda and Thornton 1985) that occur in many different unrelated structures and clearly reflect the underlying stability and geometry of these motifs. For example, we found that the unusual type I` and II` ß turns occur almost exclusively in ßß hairpins, where their twist complements that of the ß strands (see Fig. 2
). Such observations highlight the importance of good local packing, which occurs in all structures, despite their tertiary or quaternary complexities. Many such commonly occurring loop motifs have since been recognized and have been successfully incorporated into recent attempts to fold proteins ab initio.
|
| Using stereochemical observations to help detect errors in structures |
|---|
|
|
|---|
angles in folded proteins to their preferred staggered states seen in amino acids in solution? On being asked to present a talk on errors in protein structures at a CCP4 meeting, we began a systematic investigation of the correlation of stereochemical parameters with crystallographic resolution.
Our studies revealed some unusual structures in the PDB that did not seem to agree with expected distributions. Most parameters we analyzed showed a strong correlation with resolution, excluding those fixed during refinement, which cannot be used as guides to the quality of the structure (Morris et al. 1992). More recent data, extracted from structures determined to atomic resolutions (Butterworth et al. 1998), highlight the remarkably tight distributions seen for many stereochemical parameters (see Fig. 3
). Clearly, the conformation of the polypeptide backbone and side chains is close to the minimum energy configuration and is not significantly sacrificed during folding to the native state. These observations provided the basis for the suite of programs PROCHECK (Laskowski et al. 1993), which is now widely used to assess the stereochemical quality of structures. This powerful approach reflects only the physics and chemistry of the structures, with no reference to their biology or evolution.
|
| Classification of protein families by structureCATH |
|---|
|
|
|---|
As some folds are very similar, even when there is no sequence or functional evidence for an evolutionary relationship, we further clustered protein families if their topologies or folds were the same. To divide the structures at a higher level, we also assigned a class (secondary structure content) and architecture (how the secondary structures pack together regardless of sequence order) to each protein domain. This hierarchical classification CATH (Orengo et al. 1997)representing class, architecture, topology, and homologous familyis based on sequence and structure comparisons using the SSAP algorithm (Taylor and Orengo 1989) combined with manual checking of the results. In the current version of CATH v1.7 (Pearl et al. 2001), there are almost 26,000 domains clustered into 1400 homologous families and 592 topologies. In the structurally classified families in the current CATH release,
25% families are mainly
, 21% are mainly ß, and >45% are
ß. Very few (<8%) domains have low secondary structure content. Some architectures and folds are disproportionately common. Of all homologous families, 60% adopt one of eight architectures and 30% of families adopt one of 10 folds (see Fig. 4
). This probably reflects a combination of evolution, by which some families diverge beyond the level at which we can recognize members as being related, and physico-chemical effects, where unrelated sequences converge on the same fold. Other classification schemes (SCOP [Murzin et al. 1995] and FSSP [Holm and Sander 1994]) are based on similar principles, with the majority of assignments in agreement. This database is essential for all our research, which requires validated nonhomologous data sets to avoid bias.
|
| From sequence to structure: Optimal sequence threading for fold recognition |
|---|
|
|
|---|
| Molecular recognition |
|---|
|
|
|---|
| From structure to function |
|---|
|
|
|---|
class, presumably because the helices are most suited to creating a hydrophobic cavity for haem or binding into the major groove of DNA. The nucleotide binding domains are nearly all
ß proteins, which probably reflects a combination of chemistry, physics, and evolution.
|
In recent years, the vast expansion of the field, now known as bioinformatics, has changed what was a rather small esoteric area of biology into mainstream science with important applications in medicine and agriculture. To date, protein structures have been determined mainly to explain the known biological function of the protein. The advent of structural genomics and the development of new high-throughput approaches to structure determination will surely lead to a flood of new structures, some without an obvious function. Therefore, the challenge now moves from using the structure to understand biological mechanisms, which has not proved easy, to the even more difficult problem of using the structure to predict molecular interactions and complex biological function. This will involve identifying putative ligands, including small molecules and other proteins, and understanding much more about binding and catalysis. Combined with information from transcriptome and proteome analyses, the next few years will clearly reveal the molecular basis of many aspects of biological function. Hopefully it will also lead us toward rational protein design to allow proteins with novel functions to be engineered. What a glorious challenge for the next 20 years.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
Blundell, T., Lindley, P., Miller, L., Moss, D., Slingsby, C., Tickle, I., Turnell, B., and Wistow, G. 1981. X-Ray analysis of
crystallin II. Nature 289: 771.[CrossRef][Medline]
Butterworth, S., Dauter, Z., Dodson, E.J., Hooft, R.W.W., Kaptein, R., Lamzin, V.S., Laskowski, R.A., MacArthur, M.W., Murshudov, T.J., Oldfield, T.J., et al. 1998. Who checks the checkers? Four validation tools applied to eight atomic resolution structures. J. Mol. Biol. 276: 417436.[CrossRef][Medline]
Chothia, C. 1993. One thousand families for the molecular biologist. Nature 357: 543544.
Chothia, C. and Lesk, A.M. 1986. The relation between divergence of sequence and structure in proteins. EMBO J. 5: 823826.[Medline]
Holm, L. and Sander, C. 1994. The FSSP database of structurally aligned protein fold families. Nucl. Acids Res. 22: 36003609.
Jones, D.T., Taylor, W.R., and Thornton. J.M. 1992. A new approach to protein fold recognition. Nature 358: 8689.[CrossRef][Medline]
Jones, S., Berman, H., and Thornton, J.M. 1999. Protein-DNA interactions: A structural analysis. J. Mol. Biol. 287: 877896.[CrossRef][Medline]
Jones, S. and Thornton, J.M. 1996. Principles of proteinprotein interactions. Proc. Nat. Acad. Sci. 93: 1320.
Karmirantzou, M. and Thornton, J.M. 1998. Computational approaches to protein ligand interactions: Protein-heme complexes. Rational molecular design in drug research, Alfred Benzon Symposium 42. 264279.
Laskowski, R.A., MacArthur, M.W., Moss, D.S., and Thornton, J.M. 1993. PROCHECK: A program to check the stereo-chemical quality of protein structures. J. Appl. Cryst. 26: 283291.
Martin, A.C.R., Orengo, C.A., Hutchinson, E.G., Jones, S., Karmirantzou, M., Laskowski, R.A., Mitchell, J.B.O., Taroni, C., and Thornton. J.M. 1998. Protein folds and functions. Structure 6: 875884.[Medline]
McDonald, I.K. and Thornton, J.M. 1994. Satisfying hydrogen-bonding potential in proteins. J. Mol. Biol. 238: 777793.[CrossRef][Medline]
Mitchell, J.B.O., Laskowski, R.A., Alex, A., Forster, M., and Thornton, J.M. 1999. BLEEPA potential of mean force describing proteinligand interactions. II. Calculation of binding energies and comparison with experimental data. J. Comput. Chem. 20: 11771185.[CrossRef]
Moodie, S.L., Mitchell, J.B.O., and Thornton, J.M. 1996. Protein recognition of adenylate: An example of a fuzzy recognition template. J. Mol. Biol. 263: 486500.[CrossRef][Medline]
Morris, A.L., MacArthur, M.W., Hutchinson, E.G., and Thornton, J.M. 1992. Stereochemical quality of proteinstructure coordinates. Proteins Struct. Funct. Genet. 12: 345364.[CrossRef][Medline]
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of protein database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Nagano K. 1977. Logical analysis of the mechanism of protein folding. IV. Super-secondary structures. J. Mol. Biol. 109: 235250.[CrossRef][Medline]
Neurath, H. 1999. Proteolytic enzymes, past and future. Proc. Natl. Acad. Sci. 96: 1096210963.
Orengo, C.A., Jones, D.T., and Thornton, J.M. 1994. Protein superfamilies and domain superfolds. Nature 372: 631634.[CrossRef][Medline]
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton. J.M. 1997. CATHA hierarchic classification of protein domain structures. Structure 5: 10931108.[Medline]
Pearl, F.M.G., Martin N., Bray J.E., Buchan D.W.A., Harrison A.P., Lee D., Reeves G.A., Shepherd A.J., Sillitoe I., Todd A.E., et al. 2001. Nucleic Acids Res. (in press).
Ponstingl, H., Henrick, K., and Thornton, J.M. 2000. Discriminating between homodimeric and monomeric proteins in the crystalline state. Proteins Struct. Funct. Genet. 41: 4757.[CrossRef][Medline]
Raetz, C.R. and Roderick, S.L. 1995. A left-handed parallel ß helix in the structure of UDP-N-acetylglucosamine acyltransferase. Science 270: 9971000.
Richardson, J.S. 1981. The anatomy and taxonomy of protein structure. Adv. Prot. Chem. 34: 167339.[Medline]
Rossmann, M.G., Moras, D., and Olsen, K.W. 1974. Chemical and biological evolution of a nucleotide binding protein. Nature 250: 194.[CrossRef][Medline]
Sternberg, M.J.E. and Thornton, J.M. 1976. On the conformation of proteins: The handedness of the connection between parallel b-Strands. J. Mol. Biol. 110: 269283.
Sibanda, B.L. and Thornton, J.M. 1985. ß-Hairpin families in globular-proteins. Nature 316: 170174.[CrossRef][Medline]
Singh, J. and Thornton, J.M. 1990. SiriusAn automated-method for the analysis of the preferred packing arrangements between protein groups. J. Mol. Biol. 211: 595615.[CrossRef][Medline]
Sippl, M.J. 1990. Calculation of conformational ensembles from potentials of mean force: An approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 213: 859883.[Medline]
Slidel, T.W.F., and Thornton, J.M. 1995. Chirality in protein structure. In Protein folds: A distance based approach (ed. H. Bohr and S. Brunak), pp. 253264. CRC, Boca Raton, FL.
Travis, J. 2000. Biography Hans Neurath. Biochim. Biophys. Acta 1477: 36.[CrossRef][Medline]
Sweet, R.M., Wright, H.T., Janin, J., Chothia, C.H., and Blow, D.M. 1974. Crystal structure of the complex of porcine trypsin with soybean trypsin inhibitor (Kunitz) at 2.6-Å resolution. Biochemistry 13: 4212.[CrossRef][Medline]
Taylor, W.R. and Orengo, C.A. 1989. Protein structure alignment. J. Mol. Biol. 208: 122.[CrossRef][Medline]
Taroni, C., Jones, S., and Thornton, J.M. 2000. Analysis and prediction of carbohydrate binding sites. Prot. Eng. 13: 8998.
Thornton, J.M. 1981. Disulphide bridges in globular proteins. J. Mol. Biol. 151: 261287.[CrossRef][Medline]
Todd, A.E., Orengo, C.A., and Thornton, J.M. 1999. Evolution of protein function, from a structural perspective. Curr. Opin. Chem Biol. 3: 548556.[CrossRef][Medline]
Van Oostrum, J., Priestle, J.P., Gruetter, M.G., and Schmitz, A. 1991. The structure of murine interleukin-1ß at 2.8 Å resolution. J. Struct. Biol. 107: 189195.[CrossRef][Medline]
Wilmot, C.M. and Thornton, J.M. 1988. Analysis and prediction of the different types of ß-turn in proteins. J. Mol. Biol. 203: 221232.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
H. Neurath From proteases to proteomics Protein Sci., April 1, 2001; 10(4): 892 - 904. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |