Next Generation Sequencing Activities

Size: px
Start display at page:

Download "Next Generation Sequencing Activities"

Transcription

1 Next Generation Sequencing Activities Department of Control and Computer Engineering Politecnico of Turin, Italy Meeting Politecnico of Turin-Ebri Fundation PACIELLO Giulia on Behalf of 2 July 2013 FICARRA Elisa

2 Department of Control and Computer Engineering (DAUIN) Point of reference in Politecnico di Torino for the area of Information and Communication Technologies (ICT); Promotes and manages basic and applied research, training, technology transfer and services in the areas of systems and control engineering, computer science and computer engineering and operations research; 14 research laboratories, more than 60 researchers, about 100 PhD students and research collaborators. RNA-seq Workshop, March 2012 Francesco Abate - Politecnico di Torino 1

3 EDA Group EDA (Electronic Design Automation) Group: 2 full professors 1 associate professor 3 assistant professors 6 Post-Doctoral researchers 8 PhD students 5 research assistants 3 secretaries CONTACTS Politecnico di Torino, DAUIN, Corso Duca Degli Abruzzi, , Torino, Italy Tel Fax (secretariat): segreteria.macii@polito.it Three main research areas: Computer-aided design of digital electronic circuits and systems, with particular emphasis on methodologies, algorithms and tools for power estimation and optimization of systems; Smart city and Smart systems, with particular emphasis on wireless sensor and actuator network for environment monitoring and control and middleware for network interoperability; Bioinformatics (BIO-EDA group), with special emphasis on algorithms and tools for computational biology, next generation sequencing (NGS), molecular dynamics, biomedical signal and image processing and genetic network implementation. RNA-seq Workshop, March 2012 Francesco Abate - Politecnico di Torino 2

4 NGS Team NGS Andrea Acquaviva, Elisa Ficarra, Francesco Abate, Giulia Paciello, Gaspare Scherma, Gianvito Urgese External Collaborations Raul Rabadan (Department of Biomedical Informatics, Columbia University, USA) Alberto Ferrarini, Massimo Delledonne (Department of Biotechnology, University of Verona, ITALY) Ilaria Iacobucci, Simona Soverini, Giovanni Martinelli (Department of Medical Oncology and Hematology L. e A. Seràgnoli, University of Bologna, ITALY) Roberto Piva, Giorgio Inghirami (CERMS, Torino, ITALY) Alberto Zamò (Department of Pathology and Diagnostics, University of Verona, ITALY) Enzo Medico, Claudio Isella, Consalvo Petti (IRCC, Candiolo, ITALY) Raffaele Calogero (MBC, University of Torino, ITALY) RNA-seq Workshop, March 2012 Francesco Abate - Politecnico di Torino 3

5 CHIMERIC TRANSCRIPTS DETECTION TOOL Biological Overview Fusion transcripts are chimeric RNA that can be encoded by: FUSION GENES TRANS SPLICING EVENTS Cis Splicing Translocation Deletion Trans Splicing Chrmosomal Inversion RNA-seq Workshop, March 2012 Francesco Abate - Politecnico di Torino 5

6 CHIMERIC TRANSCRIPTS DETECTION TOOL Graphic representations Fusion transcript: e.g. BCR-ABL Gene A Gene B Fusion transcript: e.g. BCR-ABL Concordant Reads Exon - GA Intron - GA Exon - GA Discordant Reads Splicing Event Exon - GA Intergenic Region Exon - GB RNA-seq Workshop, March 2012 Gene Fusion Francesco Abate - Politecnico di Torino 6

7 CHIMERIC TRANSCRIPTS DETECTION TOOL Fusion Transcripts Detection Tool: Bellerophontes Bellerophontes: A RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model Francesco Abate, Andrea Acquaviva, Giulia Paciello, Elisa Ficarra, Alberto Ferrarini, Massimo Delledonne, Simona Soverini, Giovanni Martinelli, and Enrico Macii Bioinformatics Aug 15 Bellerophontes Features: Accurate junction model definition implemented by a set of modular filters; Splicing-driven alignment and abundance estimation analysis through TopHat and Cufflinks; Effective junction detection based on alignment of unmapped reads on a virtual reference. RNA-seq Workshop, March 2012 Francesco Abate - Politecnico di Torino 7

8 CHIMERIC TRANSCRIPTS PRIORITIZATION TOOL Chimeric Transcripts Prioritization Tool: Pegasus Pegasus: a comprehensive annotation tool for detection of biologically relevant gene fusions in cancer Francesco Abate, Andrea Acquaviva, Elisa Ficarra, Giorgio Inghirami and Raul Rabadan UNDER REVIEW Pegasus perfoms: The creation of a complete Fusion Candidates Database of of the entire set of gene fusion candidates detected by any of fusion detection tools; The reassembly of the chimeric transcript on the base of the two genes involved in the fusion, the genomic breakpoint coordinates and the gene annotations; The Annotation of the assembled fusion sequence to provide information on the fusion frame and to generate a complete and exhaustive report of the protein domains conserved and lost in the gene fusion and the presence or not of a kinase gene. 8

9 INTERESTING CHIMERIC TRANSCRIPTS ANALYSIS Interesting Chimeric Transcripts Analysis The fusions considered significant after Pegasus analysis, on the basis of the fusion frame, the presence of kinases and the the domains conserved or loss in the gene fusion, have to be however further investigated before PCR validation in order to avoid experiments involving false gene fusions. Ad hoc analysis pipelines have been developed on the basis of the kind of data ( read lenghts, coverage, data format, pathology) provided. The developed pipeline are intended to integrate the information deriving from biologists/doctors/biochtecnologists with those from Pegasus outputs. 9

10 VDJ RECOMBINATION DETECTION TOOL Biological Overview Variable regions of immunoglobulin heavy (IGH) and immunoglobulin light (IGL) chains of BCR are assembled respectively from germline V, D, J and V, J segments thanks to a site-specific recombination reaction called V(D)J recombination that involves the developing of T and B lymphocytes. Genes in Heavy Chain Locus VDJ recombination The deriving diversity determines the huge variability of interactions possible between antigens and antigen receptors; such kind of cells can expand under specific conditions (e.g. antigen encounter) and form monoclonal populations bearing identically rearranged gene segments. These clonal populations are usually under tight control mechanisms. However, under special occasions they might expand to an extent which causes a disease, such as in autoimmune disorders, leukemias and lymphomas. RNA-seq Workshop, March 2012 Francesco Abate - Politecnico di Torino 10

11 VDJ RECOMBINATION DETECTION TOOL VDJ-Recombination Detection Tool: V(D)J-Seq VDJ-Seq: In Silico V(D)J Recombination Detection tool Giulia Paciello, Andrea Acquaviva, Francesco Abate, Chiara Pighi, Alberto Ferrarini, Massimo Delledonne, Alberto Zamo; and Elisa Ficarra UNDER REVIEW VDJ-Seq workflow: 1) MAIN CLONE IDENTIFICATION VJ encompasssin reads retireving; VJ Couples sorted occurancy calculation; VJ Couples sorted occurancy calculation; D alleles i dentification. 2) VDJ SEQUENCE RETRIEVING RNA-seq Workshop, March 2012 Francesco Abate - Politecnico di Torino 11

12 SINGLE GENE ANALYSIS Single gene analysis On the basis of the kind of data (reads format, coverage, read lengths, ) ad hoc analysis pipeline have been developed in order to analyze genes considered of remarkably importance in different pathologies.. By means of the aforementioned pipelines it is possible to: Detect intron retentions; Define isoform transcripts; Determine expression levels. 12

13 DIFFERENTIAL EXPRESSION ANALYSIS Differential Expression Analysis (1) Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, microarrays have been used extensively to quantify the abundance of mrna corresponding to different genes, and more recently RNA-seq has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. The most common use of transcriptome profiling is in the search for differentially expressed (DE) genes, that is, genes that show differences in expression level between conditions or in other ways are associated with given predictors or responses. RNA-seq offers several advantages over microarrays for differential expression analysis: An increased dynamic range and a lower background level; The ability to detect and quantify the expression of previously unknown transcripts and isoforms. 13

14 DIFFERENTIAL EXPRESSION ANALYSIS Differential Expression Analysis (2) The analysis of RNA-Seq data is, however, not without difficulties. These difficulties can be inherent to next-generation sequencing procedures (within-sample biases) or not (betweensamples biases) : Variation in nucleotide composition between genomic regions implies that the read coverage may not be uniform along the genome; More reads will map to longer genes than to shorter ones with the same expression level; The sequencing depths or library sizes (the total number of mapped reads) are typically different for different samples, so counts are not directly comparable between samples. Ad hoc analysis pipelines,which comprise the data normalization, the choice of the better models for differential expression analysis and the correct setting of the thresholds, have been developed on the basis of the kind of data and the conditions that have to be tested. 14