Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing

Size: px
Start display at page:

Download "Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing"

Transcription

1 Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing

2

3 Ana M. Aransay José Luis Lavín Trueba Editors Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing

4 Editors Ana M. Aransay Genome Analysis Platform CIC biogune Derio, Spain Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) Madrid, Spain José Luis Lavín Trueba Genome Analysis Platform CIC biogune Derio, Spain ISBN ISBN (ebook) DOI / Library of Congress Control Number: Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG Switzerland

5 Preface High-throughput sequencing (HTS), also named next-generation sequencing (NGS) or massive parallel sequencing (MPS), is an amazingly speedy evolving world. Since 2005, when the first HTS equipment was released to the market by 454 Life Sciences, there have been dozens of companies developing a variety of methods that offer distinct characteristics, and therefore, each protocol should be applied wisely. Being aware of the wide range and complexity of the reported HTS strategies, we observed that there is a lack of bibliographic support when scientists need to choose the most suitable methodology or combination of platforms and to define their experimental designs to achieve unambiguous aims. Genomics core facilities can give limited advice on which technology fits one s purposes and the number of cloud-based HTS data analysis pipelines, to process output raw data in a standard mode, is rapidly increasing. Ideally, scientists that request this sort of services should have clear clue questions concerning wet-lab procedures and data analysis. Thus, the purpose of this guideline is to collect in a single volume all aspects that should be taken into account and the reasons behind when HTS technologies are being incorporated into a scientific research project, and it is directed to both, specialist, but primarily to newcomers. Accordingly, the book encloses a brief introduction on HTS technologies challenges, followed by 14 chapters with proficient discussions and recommendations to select the best among all the available workflows for sample processing, alignment of results, algorithms at downstream data analysis, etc., and the minimum number of samples that should be characterized in each assay for accurately sequencing and interpreting genomes, sets of RNA molecules, DNA methylated regions, nucleic acids interacting with targeted proteins, metagenomes, metatranscriptomes, and/or single-cell contents. Moreover, examples of several successful strategies are analyzed to make the point of the crucial features. Whole genome sequencing (WGS) wet-lab procedures and data analyses are portrayed in Chap. 2, followed by a description of how to face the characterization of partial genomes (i.e., genes of interest) in a number of samples in Chap. 3. In addition, a detailed variety of sequencing library preparation approaches and results examination pipelines to catalogue transcriptomes, sets of noncoding RNAs and v

6 vi Preface small RNAs as well as ribosome networking RNAs under singular conditions, are depicted within Chaps Furthermore, ways of studying epigenetic events such as DNA methylation and interactions of DNAs or RNAs with targeted proteins are illustrated in Chaps. 9, 10, and 11, respectively. Chapters 12 and 13 discuss the appealing world of classifying environmental (e.g., microbial communities) genomes and transcriptomes by means of metagenomics and metatranscriptomics. Likewise, the hot topic of single-cell DNA and RNA content characterization is considered in Chaps. 14 and 15. The last chapter of the book, Chap. 16, is a detailed protocol on how to submit HTS data to public repositories as required when this sort of results are being published. As a special feature, this book includes a sort of quick reference guide as appendix for each chapter, where readers can, at a glance, access a figure representing the main steps of the wet-lab and bioinformatic workflows as well as a table that gathers information about the experimental design recommendations for the techniques described and another one referred to the bioinformatic recommended analysis software together with the results yielded by each program. The intention of this section is to grant rapid access to a summary of the principles of each of the methodologies described. Considering that HTS technologies can be applied to a vast variety of biological questions and are used by scientists working in unlike fields such as biology, medicine, or ecology, and in a wide range of taxonomical levels (mammals, plants, bacteria, viruses, etc.), we hope that this book will be a precious resource for all scientist that lack skills in HTS and pretend to incorporate such technologies into their research. Derio, Spain Derio, Spain Ana M. Aransay José Luis Lavín-Trueba

7 Contents 1 The High-Throughput Sequencing Technologies Triple-W Discussion: Why Use HTS, What Is the Optimal HTS Method to Use, and Which Data Analysis Workflow to Follow... 1 José Luis Lavín Trueba and Ana M. Aransay 2 Whole-Genome Sequencing Recommendations Toni Gabaldón and Tyler S. Alioto 3 Targeted DNA Region Re-sequencing Karolina Heyduk, Jessica D. Stephens, Brant C. Faircloth, and Travis C. Glenn 4 Transcriptome Profiling Strategies Abdullah M. Khamis, Vladimir B. Bajic, and Matthias Harbers 5 Differential mrna Alternative Splicing Albert Lahat and Sushma Nagaraja Grellscheid 6 microrna Discovery and Expression Analysis in Animals Bastian Fromm 7 Analysis of Long Noncoding RNAs in RNA-Seq Data Farshad Niazi and Saba Valadkhan 8 Ribosome Profiling Anze Zupanic and Sushma Nagaraja Grellscheid 9 Genome-Wide Analysis of DNA Methylation Patterns by High-Throughput Sequencing Tuncay Baubec and Altuna Akalin 10 Characterization of DNA-Protein Interactions: Design and Analysis of ChIP-Seq Experiments Rory Stark and James Hadfield vii

8 viii Contents 11 PAR-CLIP: A Genomic Technique to Dissect RNA-Protein Interactions Tara Dutka, Aishe A. Sarshad, and Markus Hafner 12 Metagenomic Design and Sequencing William L. Trimble, Stephanie M. Greenwald, Sarah Owens, Elizabeth M. Glass, and Folker Meyer 13 A Hitchhiker s Guide to Metatranscriptomics Mariana Peimbert and Luis David Alcaraz 14 Eukaryotic Single-Cell mrna Sequencing Kenneth J. Livak 15 Eukaryotic Single-Cell DNA Sequencing Keith E. Szulwach and Kenneth J. Livak 16 Submitting Data to a Public Repository, the Final Step of a Successful HTS Experiment Christopher O Sullivan and Jonathan Trow Index

9 Contributors Altuna Akalin, Ph.D. Bioinformatics Platform, Berlin Institute for Medical Systems Biology, Max Delbrück Centre, Berlin, Germany Luis David Alcaraz Departamento de Ecología de la Biodiversidad, LANCIS, Instituto de Ecología, Universidad Nacional Autónoma de México, Coyoacán, Cd. Mx., México Tyler S. Alioto, B.S., Ph.D. Centro Nacional de Análisis Genómico, Centre de Regulació Genòmica, Barcelona, Spain Ana M. Aransay, Ph.D. Genome Analysis Platform, CIC biogune, Derio, Spain Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Madrid, Spain Vladimir B. Bajic, Ph.D. Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia Tuncay Baubec, Ph.D. Epigenomics and Chromatin Biology Lab, Institute of Veterinary Biochemistry and Molecular Biology, University of Zurich, Zurich, Switzerland Tara Dutka Laboratory of Muscle Stem Cells and Gene Regulation, NIAMS, Bethesda, MD, USA Brant C. Faircloth, Ph.D. Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA Bastian Fromm, Ph.D. Department of Tumor Biology, Institute for Cancer Research, Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway Toni Gabaldón, Ph.D. Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain ix

10 x Contributors Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain Elizabeth M. Glass Argonne National Laboratory, Argonne, IL, USA Travis C. Glenn Department of Environmental Health Science, University of Georgia, Athens, GA, USA Stephanie M. Greenwald Institute for Genomics and Systems Biology, Argonne, IL, USA Sushma Nagaraja Grellscheid, Ph.D. School of Biological and Biomedical Sciences, Durham University, Durham, UK James Hadfield, B.Sc., Ph.D. Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK Markus Hafner Laboratory of Muscle Stem Cells and Gene Regulation, NIAMS, Bethesda, MD, USA Matthias Harbers, Ph.D. Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Karolina Heyduk Department of Plant Biology, University of Georgia, Athens, GA, USA Abdullah M. Khamis, M.Sc. Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia Albert Lahat, B.Sc. School of Biological and Biomedical Sciences, Durham University, Durham, UK José Luis Lavín Trueba, Ph.D. Genome Analysis Platform, CIC biogune, Derio, Spain Kenneth J. Livak, Ph.D. Fluidigm Corporation, South San Francisco, CA, USA Folker Meyer Argonne National Laboratory, Argonne, IL, USA Farshad Niazi, M.D. Department of Molecular Biology and Microbiology, Case Western Reserve University School of Medicine, Cleveland, OH, USA Christopher O Sullivan National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD, USA Sarah Owens Argonne National Laboratory, Argonne, IL, USA Mariana Peimbert Departamento de Ciencias Naturales, Universidad Autónoma Metropolitana Unidad Cuajimalpa, Cuajimalpa, Cd. Mx., México Aishe A. Sarshad Laboratory of Muscle Stem Cells and Gene Regulation, NIAMS, Bethesda, MD, USA

11 Contributors xi Rory Stark, B.A., M.Sc., M.Phil., D.Phil. Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK Jessica D. Stephens Department of Plant Biology, University of Georgia, Athens, GA, USA Keith E. Szulwach, Ph.D. Fluidigm Corporation, South San Francisco, CA, USA William L. Trimble Institute for Genomics and Systems Biology, Argonne, IL, USA Jonathan Trow National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD, USA Saba Valadkhan, M.D., Ph.D. Department of Molecular Biology and Microbiology, Case Western Reserve University School of Medicine, Cleveland, OH, USA Anze Zupanic, Ph.D. Department of Environmental Toxicology, Eawag Swiss Federal Institute for Aquatic Research and Technology, Dübendorf, Switzerland