Science mathématique et recherche génomique

Mathematical Science and Genome Research

du 20 au 26 mai 1990

Organisateurs : L.H. Cox, E. Lander

Participants :

Craig Benham (Mount Sinai School of Medecine, New-York, USA), Peter J. Bickel (University of California, Berkeley, USA ), Lawrence H. Cox, organiser (National Academy of Sciences, Washington, USA), Walter Gilbert (Harvard University, Cambridge, USA), James I. Glimm (SUNY, Stony Brook, New-York, USA), Eric S. Lander, organiser (MIT, Cambridge, USA), Eugene Myers (University of Arizona, Tucson, USA), Christopher Sander (European Molecular Biology Lab., Heidelberg, Germany), David Sankoff (University of Montreal, Canada), De Witt Sumners (Florida State University, USA), Simon Tavaré (University of South California, Los Angeles, USA), William Taylor (National Institute for Medical Research, London, UK), Bernard Teissier (Ecole normale supérieure, Paris, France), Michael S. Waterman (University of South California, Los Angeles, USA), Claude A. Weber (Université de Genève, Switzerland), James White University of South California, Los Angeles, USA).

Résumé

Depuis les années 80, la biologie moléculaire a subi une transformation spectaculaire pour devenir une discipline très riche en données. De plus en plus, l’interprétation des nouveaux résultats expérimentaux dépend de la comparaison de séquences ou de structures moléculaires nouvellement identifiées avec une importante base de données internationale existante. En conséquence, la biologie moléculaire a commencé à dépendre d’outils mathématiques et informatiques pour interpréter les données générées en laboratoire. Ces nouveaux défis ont commencé à donner naissance à un nouveau champ d’analyse interdisciplinaire.

La rencontre qui s’est déroulée a rassemblé un certain nombre des meilleurs scientifiques impliqués dans cette nouvelle discipline pour discuter des problèmes clés dans ce domaine. En plus de promouvoir le développement de cette discipline émergente, la réunion avait pour second objectif de débuter la rédaction, par les participants, d’un premier exposé général sur le sujet.

A l’issue de ce colloque, parution du livre “Calculating the secrets of life: Applications of the mathematical sciences in molecular biology”; ed. by Eric S. Lander and Michael S. Waterman; Washington, National Academic Press, 1995

Compte rendu (en anglais)

The scientific lectures were aimed at surveying and synthesizing developments in areas ranging from applications of topology to DNA recombination to applications of statistics to protein structure analysis. They included presentations by:

Craig Benham, who described the application of principles of mechanics to DNA structure. The torsional stresses on DNA are believed to affect the case with which enzymes may open the double helix, and thus this physical property of DNA domains is likely to have important bearing on the regulation of gene expression.

Walter Gilbert, who described the Human Genome Project broadly and its implications for the understanding of the evolution of proteins. Genes consist of continuous blocks called exons interrupted by intervening sequences called introns. According to one influential theory, these exons are the descendants of the ancestral building blocks of ail modem proteins. By examining the collection of exons that have been sequenced to date, it is possible (by examining the frequency with which related sequences are seen) to make inferences about the original number of such building blocks in the primordial soup. Such estimates range from perhaps 4,000-20,000, but are sufficiently low that it suggests that it may be possible to assemble a thesaurus of the basic motifs underlying protein structure.

Eric Lander, who described the application of statistical methods to genetic mapping of complex traits. With the availability of complete genetic maps of the genomes of higher organisms, it is now possible to simultaneously follow the inheritance pattern of all chromosomal regions simultaneously. In turn, this has made it possible to find multiple genes which act together to produce polygenic traits – which are of importance in both agriculture and medicine.

Eugene Myers, who described the application of computer science to the problem of analyzing large databases of DNA and protein sequences. With new algorithms, it is now possible to search a database in sublinear time (in the size of the database) to find even partial and imperfect matches. These new algorithms are rapidly becoming the tool of choice for biologists trying to understand the function of newly discovered proteins.

Chris Sander, who spoke about methods for studying the three dimensional shape of proteins based upon their one dimensional sequences. One especially powerful approach is the construction of models starting from previously known protein structures. The approach raises natural questions about the relationship between the degree of similarity of protein sequences and protein structure.

David Sankoff, who spoke about application of mathematical methods to the study of evolutionary trees. Reconstructing the correct evolutionary tree is a difficult problem, because of the large number of possible topologies (branching orders) and geometries (branch lengths) for the tree. A developing theory of « invariants” is beginning to bring order to such problems -starting with simple linear invariants and now growing to include a richer set of quadratic invariants.

De Witt Sumners, who spoke about the application of topology and geometry to DNA recombination. By examining purely mathematical considerations, it is possible to make detailed inferences the action of DNA recombination enzymes such as Tn3 Resolvase which cannot be gained through experimental methods (at least to date). Interesting open questions pertaining to the extraordinary tangles of trypanosome kinetoplast DNA were also discussed.

Simon Tavaré, who spoke about statistical and probabilistic methods in interpreting evolutionary and population genetic data. The ability to read DNA and protein sequences from many individuals in a population raises the possibility of reconstructing evolutionary or population genetic history. However, accurate reconstruction requires an understanding of the complex « random walks » which ensembles of genes undergo over time.

William Taylor, who spoke about methods for studying the three dimensional shape of proteins based upon their one dimensional sequences. A spectrum of methods is now available capable of detecting weaker and weaker similarities, by comparing multiple members of an evolutionary family. These similarities help to identify « consensus sequences » that point to the crucial features involved in maintaining a particular structure.

Michael Waterman, who spoke about the application of the probabilistic theory of large deviations to the analysis of DNA and protein sequences. When a newly discovered DNA or protein sequence is compared to a large database of previously known sequences, one must examine the best observed matches to determine whether they might simply have occurred by chance. The length and character of the best such matches turn out to be governed by interesting laws that exhibit a phase transition as the definition of a match in weakened. Understanding this phase transition points out the regions of most interest for biological study.

James White, who spoke about the application of topology to the study of DNA structure bound to proteins. The concepts of linking, writhing and twisting of curves and the application of differential topological invariants allow a sophisticated understanding or the wrapping of DNA around such structures as nucleosomes, which package DNA in the nucleus.

In addition, the participants produced the outline of a six-chapter book on mathematics and biology. Completion and publication of the book is planned by early 1993, and would represent a permanent contribution of this important first gathering aux Treilles.

Eric Lander

 

Ce contenu a été publié dans Archives mises en ligne. Vous pouvez le mettre en favoris avec ce permalien.