Genome, Transcriptome and Proteome Analysis

Size: px
Start display at page:

Download "Genome, Transcriptome and Proteome Analysis"

Transcription

1 Genome, Transcriptome and Proteome Analysis Alain Bernot Généthon III, France Translated by James McClellan and Susan Cure

2

3 Genome, Transcriptome and Proteome Analysis

4

5 Genome, Transcriptome and Proteome Analysis Alain Bernot Généthon III, France Translated by James McClellan and Susan Cure

6 First published in French as Analyse de Génomes, Transcriptome et Protéoms ß 2001 Dunod, Paris Translated into English by James McClellan with excerpts translated by Susan Cure This work has been published with the help of the French Ministère de la Culture-Centre national du livre English language translation copyright ß 2004 by John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone (þ44) (for orders and customer service enquiries): Visit our Home Page on or All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or ed to permreq@wiley.co.uk, or faxed to (þ44) This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Other Wiley Editorial Offices John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA Jossey-Bass, 989 Market Street, San Francisco, CA , USA Wiley-VCH Verlag GmbH, Boschstr. 12, D Weinheim, Germany John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop # 02 01, Jin Xing Distripark, Singapore John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN hardback X paperback Typeset in 10.5 on 13 pt by Kolam Informations Services Pvt. Ltd, Pondicherry, India Printed and bound in Great Britain by TJ International Ltd., Padstow, Cornwall This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at least two trees are planted for each one used for paper production.

7 Contents Preface About the Author Acknowledgements xi xiii xv 1 General Introduction Review of Molecular Genetics Nucleic acids Constituents of the genome Organelle genomes The structure of genes Transcription and translation Genes, multigene families and conserved sequences Coding and non-coding fractions of the genome The Tools of Molecular Biology Restriction enzymes and electrophoresis Cloning and construction of libraries Hybridization techniques Enzymatic amplification (PCR) Sequencing DNA and sequence analysis Chromosomal assignment Techniques specific to genomic analysis Specifics of the Genome Programmes Creation of the genome programmes Genome research centres The growing importance of bioinformatics Automation and robotics The Species Analysed The main branches of life Prokaryotes Eukaryotes 28

8 vi CONTENTS 2 Linkage Maps Tools and Methods in Genetic Mapping The genetic maps of Drosophila and humans Tools for genetic mapping in humans RFLP markers Microsatellites Mapping methodology The Development of Genetic Maps RFLPs and the first genetic maps Microsatellites and modern maps Progress of the human genetic map SNP maps Genetic maps of model and domestic species Size of genetic maps Radiation Hybrid Maps Principles of radiation hybrid mapping The establishment of radiation hybrid maps The first maps Size of maps Conclusion 52 3 Physical Maps Local Maps or Small Genomes Mapping tools Physical mapping of model species Local maps in humans Strategies for Physical Mapping of the Human Genome The chromosome-specific approach The whole-genome approach STS mapping Restriction mapping Mapping by hybridization Maps of the Human Genome The CEPH/Généthon map The WI/MIT map Chromosome-specific mapping New generation human physical maps Conclusion 69 4 Genome Sequencing Strategic Choices Approaches used for large-scale sequencing Sequencing strategies 72

9 CONTENTS vii Organisms sequenced Identifying genes Prokaryotic Genomes Chromosome structure Gene organization Remarkable genes Virulence Non-coding sequences Comparative genomics Genomes of Model Eukaryotes Chromosome structure Identification of genes Functions of recognized or predicted genes Genes specific to metazoans Plant genomics Homologues of genes responsible for human disease Non-coding regions Evolutionary genomics The Human Genome Human chromosomes Identification of genes Repeated sequences Evolution Other Genomes Sequenced Plasmodium falciparum Anopheles Microsporidium Leishmania major Conclusion Sequencing cdna and the Transcriptome Strategies of cdna Sequencing Posing the problem Production and sequencing of cdna Choice of tissue of origin Large-scale sequencing of cdna The Economic Stakes Data protection The involvement of large pharmaceutical companies The coining of partial sequences The cdna race The Analysis of cdna Sequences Assembly of partial sequences 129

10 viii CONTENTS Identification of new genes The transcriptional map of the human genome Identification of genes responsible for genetic diseases cdna programmes in animal and plant species The Transcriptome Local analyses of transcription Massive sequencing and the global vision of cellular physiology Use of nucleic acid chips The SAGE technique Conclusion The Proteome Basic Techniques Electrophoresis Chromatography Ultracentrifugation Use of antibodies Study of protein interactions Informatics Transgenesis Mouse transgenics Examples of mouse transgenics Transgenesis in other species Mutagenesis Directed mutagenesis Directed mutagenesis in mice (knock-out) Examples of knock-out in mice Random mutagenesis Two-dimensional Electrophoresis and Identification of Proteins Separation of proteins by two-dimensional electrophoresis Protein identification by classic techniques Identification of proteins by mass spectrometry Examples of applications Protein interactions Identification of Protein Interactions by Two-hybrid System Strategy Protein protein interactions RNA protein interactions Global approaches 186

11 CONTENTS ix 6.6 Chip Technology Construction of protein chips Strategies employed Analysis of Three-dimensional Structure Crystallography Nuclear magnetic resonance Protein structure Example: histocompatibility antigens Conclusion Identification of Genes responsible for Disease Genetic Diseases Monogenic and multifactorial diseases Mutations Cloning genes responsible for diseases Functional Cloning and Chromosomal Anomalies Examples of functional cloning Chromosomal anomalies Mitochondrial diseases Strategy for Positional Cloning Recruitment of affected families Genetic mapping and primary localization Physical mapping Identification of genes present in a localization interval, and of the disease-causing gene The first success of positional cloning What is the Future for the Cloning of Disease-causing Genes? Monogenic diseases Multifactorial diseases Sequencing Towards a redefinition of some genetic diseases New mutational mechanisms Genetic diseases and therapies Conclusion 218 General Conclusion 219 Further Reading 221 Index 223

12

13 Preface The Genome Project began in This project s aim is the analysis of the human genome, along with those of model organisms, to determine the location of all the genes in those genomes, and finally to establish their complete sequence. This objective is at the same time very simple, because a genome can be entirely described by the order of four letters A, T, G and C, and very complex, because the human genome contains more than three thousand million such letters. This project is the first enterprise of truly international stature in the biomedical field. Because of its exceptional ambition it has been compared to the Apollo programme in space travel, to the quest for the Holy Grail, or to the establishment of the Periodic Table for biology. The completion of this work will greatly advance our knowledge in the areas of both biology and health. The Genome Project should put at our disposal the necessary tools to understand and to treat numerous diseases having a degree of genetic predisposition. The data produced will also be of considerable interest for fundamental biology. The first results of the sequencing of model organisms already give a foretaste of this. As always in research, the results lead to at least as many questions as answers. Since 1990, the development of the genome programmes has led to the establishment of genetic and physical maps, and the sequencing of the genomes of model organisms (or pathogens) and man. Complete sequencing of the genomes of representatives of the major groups of life has been achieved, and that of man, whilst currently partial, will probably be finished in In parallel, programmes for the analysis of transcription (transcriptome) and of translation (proteome) have been developed both in model organisms and man.

14 xii PREFACE This book summarizes the work already achieved, and anticipates the likely scale of the discoveries to come, with their implications for fundamental biology and medicine. It gives the most up-to-date synthesis of these scientific enterprises. Alain Bernot

15 About the Author Alain Bernot is a graduate of the École normale supérieure, PhD, and Professor at the Université d Evry. He is currently working in a genetic therapy programme at Genethon. He previously directed the sequencing and analysis of a vertebrate genome (Tetraodon nigroviridis), and contributed to the identification of a gene responsible for a human genetic disease (Mediterranean familial fever).