2016 ANNUAL REPORT BIG

Size: px
Start display at page:

Download "2016 ANNUAL REPORT BIG"

Transcription

1 2016 ANNUAL REPORT BIG Data Center, Beijing Institute of Genomics Chinese Academy of 1 Sciences

2 China Genomic Data Sharing Initiative China has become a powerhouse in generating enormous amounts of genomic data but is in the embarrassing situation of lacking a national resource for genomic data deposition and sharing. To meet the crucial needs of China National Big Data Strategy, BIG Data Center for Life and Health Sciences ( established in Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS) with the funding support primarily from the Ministry of Science and Technology and CAS, has developed the Genome Sequence Archive (GSA; gsa.big.ac.cn), a data repository for archiving raw sequence reads and facilitating genomic data deposition, management and sharing. GSA is equivalent to the Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI), United States. The ultimate goal of GSA is to promote centralized management of China genomic data and to form national strategic development resources of genomic data in support of R&D activities in both academia and industry. Hence we propose the China Genomic Data Sharing Initiative to raise the public awareness and reach the consensus among domestic and foreign academic institutions, universities, and enterprise and government organizations, etc. The initiative is listed as follows: A standard practice should be established as soon as possible for centralized management and public sharing of China Genomic Data. Genomic data derived from China national scientific research projects should be managed in a centralized manner in China. Raw sequence data should be deposited into Genome Sequence Archive (GSA). BIG Data Center, Beijing Institute of Genomics Chinese Academy of Sciences No.1 Beichen West Road, Chaoyang District Beijing , China +86 (10) (10) bigd@big.ac.cn 2 3

3 China Genomic Data Sharing Initiative BIG Data Center, Beijing Institute of Genomics Chinese Academy of Sciences List of Participants (1115 participants from 425 organizations) Name Organization Name Organization 郝柏林 复旦大学理论生命科学研究中心 宁泽民 Wellcome Trust Sanger Institute 罗静初 北京大学生物信息中心 王秀杰 中国科学院遗传与发育生物学研究所 徐 中国科学院生物物理研究所 薛 宇 华中科技大学生命科学与技术学院 傅小兰 涛 中国科学院心理研究所 高 歌 北京大学生物信息中心 胥伟华 中国科学院遗传与发育生物学研究所 赵方庆 中国科学院北京生命科学研究院 曹晓风 中国科学院遗传与发育生物学研究所 朱朝东 中国科学院动物研究所 施 中国科学院昆明动物研究所 苏志熙 复旦大学 陈洛南 中国科学院上海生命科学研究院 于 云南大学 薛勇彪 中国科学院北京基因组研究所 汪小我 清华大学 中国科学院遗传与发育生物学研究所 李伟忠 中山大学中山医学院 王丽萍 中国科学院北京基因组研究所 秦 峰 中国科学院植物研究所 张德兴 中国科学院北京基因组研究所 叶 凯 西安交通大学 中国科学院动物研究所 陈玲玲 华中农业大学 复旦大学生物医学研究院 鹏 黎 中国科学院北京基因组研究所 刘 哈佛大学 辛德莉 首都医科大学附属北京友谊医院 张学工 清华大学 苏志熙 复旦大学 杨卫平 中国科学院监督审计局 历 军 中科曙光 彭 中国科学院国际合作局 李瑞强 诺禾致源 中国科学院条财局信息化工作处 郑洪坤 北京百迈客生物科技有限公司 军 颖 陈明奇 雷 Wikis for Community Annotations Organization Structure Herbin Changchun Wulumuqi Shenyang Huhehaote Shijiazhuang Yinchuan Taiyuan Lanzhou Chongqing Methylation Bank DNA & RNA Methylomes Sequencing & Array Variants Nanjing Hefei Shanghai Hangzhou Nanchang Changsha Fuzhou Guiyang Guangzhou Kunming Hong Kong Nanning Macao Haikou 4 Genome Variation Map Jinan Wuhan Chengdu Government Institute University Hospital Enterprise Media Others BIG Data Center RNA-Seq Expression Profiles Zhengzhou Xi'an Lasa Gene Expression Nebulas Science Wikis Xining Genes, Genomes and Sequences Raw Sequence Reads 刘小乐 于 Genome Warehouse Genome Sequence Archive The BIG Data Center, officially founded on 29 February 2016, advances life & health sciences by providing freely open access to a variety of data resources, with the aim to translate big data into big knowledge and support worldwide research activities in both academia and industry. 1

4 Objective and Goal Organizational Structure The rapid advancements of high-throughout sequencing technologies provide us with formidable capacities in genome sequencing, accordingly producing biological data at an unprecedentedly exponential rate and resultantly accumulating huge amounts of biological data at multiple omics levels. To address the most important and complex biological questions, it is often required to provide researchers with open access to various data resources. Nowadays, China has become a powerhouse in generating vast quantities of biological data, but is in the embarrassing situation of lacking a centralized data center that is committed to opening data in this big data world and to making data well-organized and publicly accessible to worldwide scientific communities. Leadership Scientific Advisory Board Team Leaders Administration Deposition Integration Translation Work Teams The BIG Data Center, established at Beijing Institute of Genomics, Chinese Academy of Sciences, is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. Hardware & System Administration Genome Sequence Archive Genome Variation Map Methylation Bank Precision Medicine Knowledgebase Genome Warehouse Gene Expression Nebulas Science Wikis Electronic Health Record 2 3

5 Scientific Advisory Board Leadership Prof. Vladimir Bajic Director of Computational Bioscience Research Center King Abdullah University of Science and Technology Saudi Arabia Zhang Zhang Executive Director of BIG Data Center Tel: +86 (10) Dr. Guy Cochrane Head of European Nucleotide Archive European Bioinformatics Institute (EBI) United Kingdom Dr. Frank Eisenhaber Head of Biomolecular Function Discovery Division Executive Director, Bioinformatics Institute Singapore Prof. Takashi Gojobori Former Vice-Director of National Institute of Genetics, Japan Distinguished Professor of King Abdullah University of Science and Technology Saudi Arabia Prof. Jingchu Luo China Node Manager of the European Molecular Biology Network (EMBnet) Center for Bioinformatics, Peking University China Dr. Zhang is a Professor of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS). He obtained PhD degree in Computer Science from Institute of Computing Technology, CAS in Prior to joining BIG, he worked as Postdoctoral Associate at Yale University from 2007~2009 and Research Scientist at King Abdullah University of Science and Technology from 2009~2011. Dr. Zhang was selected in the CAS 100-Talent Program in 2011 and appointed as Executive Director of BIG Data Center in His research focuses on big data integration and mining, molecular evolution and computational precision health. As of December 2016, he has authored 57 papers in scientific journals and acts as Associate Editor-in-Chief in Genomics Proteomics & Bioinformatics, Editorial Board Member in Biology Direct, Academic Editor in PLoS ONE, and Executive Committee Member of International Society for Biocuration. Wenming Zhao Deputy Director of BIG Data Center zhaowm@big.ac.cn Tel: +86 (10) Mr. Zhao is Deputy Director of BIG Data Center. He was selected in the CAS Key Technology Talent Program in His research interests lie in NGS data analysis and bioinformatics database construction. Mr. Zhao leads the Genome Sequence Archive (GSA) team, which is the first raw sequence archive resource in China. He is responsible for constructing and maintaining high-performance computing infrastructure for BIG and building Bioinformatics Cloud Computing Platform. Dr. Ilene Mizrachi GenBank Coordinator National Center for Biotechnology Information (NCBI) United States Jingfa Xiao xiaojf@big.ac.cn Tel: +86 (10) Prof. Weimin Zhu Head of Data Science National Center for Protein Sciences China Dr. Xiao is a Professor of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS). He obtained PhD degree from Jilin University, China in Prior to joining BIG, he worked as Postdoctoral Associate in Peking Union Medical College and University of Utah from 2003~2007. He has published more than 60 papers in scientific journals, including Bioinformatics, Nature communications, Briefings in Bioinformatics, etc. He has been invited to serve as an editorial board member for Frontiers of Plant science, GPB, Peer J. Current research interest is mainly focused on highthroughput omics data integration and mining and genome informatics for precision medicine. 4 5

6 Team Leaders (TL) Personnel Yanqing Wang, Team Leader of Genome Sequence Archive MS in Technology of Computer Application from Computer Network Information Center, CAS in 2009, joining BIG in 2009, with particular focuses on Software & Database Development. The BIG Data Center features energetic, collaborative and interdisciplinary working teams. Till December 2016, there are a total of 54 people in BIG Data Center, including 30 staff members and 24 graduate students, with the average age at 32. Meili Chen, Team Leader of Genome Warehouse PhD in Bioinformatics from University of CAS in 2013, joining BIG in 2013, with research interests on data mining of high-throughput omics data and database development. Personnel Structure Shuhui Song, Team Leader of Genome Variation Map PhD in Bioinformatics from BIG, CAS in 2008, joining BIG in 2008, with research interests on population genomics, genome variation, cancer bioinformatics, and RNA methylation. 18 Lili Hao, Team Leader of Gene Expression Nebulas PhD in Bioinformatics from BIG, CAS in 2011, joining BIG in 2011, with research interests on transcriptional and post-transcriptional regulation and transcriptome data integration and mining. Rujiao Li, Team Leader of Methylation Bank PhD in Physical Chemistry from Jilin University in 2005, joining BIG in 2008, with research interests on DNA methylation, epigenome data Integration and database development. Degree Structure 50% 24 25% 5 7 0% Bioinformatics & Biocuration Scientist BS Hardware & System Administrator Database & Software Developer Graduate Student Lina Ma, Team Leader of Science Wikis PhD in Bioinformatics from BIG, CAS in 2010, joining BIG in 2012, with research interests on long non-coding RNA integration and evolution, RNA regulatory network and community annotation. Zhenglin Du, Team Leader of Precision Medicine PhD in Biochemistry & Molecular Biology from China Agricultural University in 2007, joining BIG in 2008, with research interests on comparative genomic analysis and big data integration and curation. Li Lan, Team Leader of Administration MBA in Alberta University Canada in 2004 and MS in Simultaneous Interpretation in South China Normal University in 2002, joining BIG in 2015, responsible for center administration and international affairs. Huanxin Chen, Team Leader of Hardware & System Administration BS in Business Administration from Beijing Open University in 2005, joining BIG in 2006, responsible for network and computer systems administration. 6 7 MS PhD

7 Graduate Students Database Resources BIG Students PhD Genome Sequence Archive MS Raw Sequence Reads Science Wikis Lin Xia Xin Sheng Xufei Teng Chen Gao Guangyi Niu Genome Warehouse Wikis for Community Annotations Genes, Genomes and Sequences BIG Data Center Shixiang Sun Shuo Shi Jinyue Wang Lijuan Zhang Chunlei Yu Gene Expression Nebulas Methylation Bank DNA & RNA Methylomes RNA-Seq Expression Profiles Genome Variation Map Hongyan Yin Guangyu Wang Yadong Zhang Fuwen Yao Hongyi Lv Wan Fang Sequencing & Array Variants Genome Sequence Archive (GSA): a data repository for archiving sequence reads Genome Warehouse (GWH): a centralized resource housing genome-scale data Xingjian Xu Jian Sang Man Li Lin Liu Mengwei Li Qing Zhou Genome Variation Map (GVM): a comprehensive collection of genome variations for featured species Gene Expression Nebulas (GEN): a data portal of gene expression profiles based entirely on RNA-Seq data Visiting Students Methylation Bank (MethBank): an integrated database of whole-genome single-base resolution methylomes Science Wikis: a central access point for biological wikis developed for community annotations Zhennan Wang Nan Li 8 Fang Liu 9

8 Databases Genome Sequence Archive Databases Genome Sequence Archive Introduction Landmark The Genome Sequence Archive (GSA; or is a data repository specialized for archiving raw sequence reads. It supports data generated from a variety of sequencing platforms ranging from Sanger sequencing machines to single-cell sequencing machines and provides data storing and sharing services free of charge for worldwide scientific communities. In addition to raw sequencing data, GSA also accommodates secondary analyzed files in acceptable formats (like BAM, VCF). Database Systems Oct. 2015: GSA version 1.0 is available online, that is the first sequence archive in China functionally equivalent to INSDC members. Dec. 2015: Data submissions to GSA have been reported by multiple high-profile journals, including PNAS, AJHG, Cell Research, etc. Apr. 2016: Invited talk presentation at 2016 International Biocuration Conference in Switzerland. Sept. 2016: Propose China Genomic Data Sharing Initiative, which is supported widely by domestic and foreign academic institutions, universities, etc. Dec. 2016: The paper of BIG Data Center including GSA and other database resources has been published in Nucleic Acids Research. Big Data Systems Achievements Service System Experiment Run File Size BioProject BioSample Omics data Storage System Features Metadata Collection System Data Submission System Quality Control System National Genomic Big Data Number of Experiments/Runs Provide services for CAS Strategic Priority Research Programs, including Evolutionary Genotype-Phenotype Systems Biology and Molecular Model Design and Breeding Terabytes (TB) Number of Bioprojects As of Dec. 2016, GSA has archived data submissions of ~200 BioProjects, which are submitted by a total of -160 registered submitters from 39 institutions. Number of BioSamples Free data archive for worldwide research communities Public data available throughout the world Compatible standards with INSDC Massive data storage and retrieval capacity Well organization of metadata and sequencing data Secure and convenient links for peer review Stable system for data deposition and sharing Team Members Yanqing Wang (TL) wangyanqing@big.ac.cn Database Design & Web Development Junwei Zhu zhujw@big.ac.cn Database Design & Web Development Tingting Chen chentt@big.ac.cn Web Development Local Service Data Security Peer Review Online Support Sisi Zhang zhangss@big.ac.cn Lili Dong donglili@big.ac.cn 10 11

9 Databases Genome Warehouse Databases Genome Variation Map Introduction Introduction The Genome Warehouse (GWH; is a centralized resource housing genomescale data produced by different genome sequencing initiatives. It integrates high-quality genome sequences and related data and provides users with open access to a collection of genomes from featured important species, including the giant panda, chicken, pig and other important animals, and the rice, soybean, maize and other important plants. The Genomic Variation Map (GVM; is a data repository and retrieval system of genome variations, including single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels). GVM focuses on genome variations for human as well as domesticated animals (e.g. dog) and cultivated plants (e.g. rice), which are of great importance for population precision medicine studies, exploration of favorable traits and investigation of species domestication and evolution. Species-Specific Variation Databases Features Functional genomics Integrate multi-species genome data from important animals, plants and microbes Provide a variety of service for data submission, storage, release and sharing Offer standardized quality control for genome sequence and genome annotation Provide online tools for sequence alignment, pan-genomes analysis, etc Evolutionary studies GWAS/eGWAS/pGWAS Pharmacogenomics Features Construct a genetic variation atlas for Chinese population Integrate genomic variations for featured animals and plants in China Collect genotype and phenotype information and develop genome-wide association mining analysis tools Build species-/micro evolution analysis resources Team Members Meili Chen (TL) chenml@big.ac.cn System Design & Data Team Members Fan Wang wangfan@big.ac.cn Database Development Zhewen Zhang zhangzw@big.ac.cn 12 Jian Sang sangj@big.ac.cn Shuhui Song (TL) songshh@big.ac.cn System Design & Data Bixia Tang tangbx@big.ac.cn Tool & Database Development Dongmei Tian tiandm@big.ac.cn Lili Dong donglili@big.ac.cn 13 Cuiping Li licp@big.ac.cn

10 Databases Gene Expression Nebulas Databases Methylation Bank Introduction Introduction The Gene Expression Nebulas (GEN; is a data portal of gene expression profiles based entirely on RNA-Seq data. High-throughput sequencing technologies provide a revolutionary way for transcriptome profiling, enable facile generation of large-scale sequencing data and accordingly facilitate high-resolution quantification of gene expression levels across a variety of tissues and treatments. Thus, GEN, integrating RNA-Seq-derived gene expression profiles, is of fundamental significance for deciphering functional elements under diverse conditions and characterizing the dynamics of transcriptomic regulation. The Methylation Bank (MethBank; is a repository that integrates wholegenome single-base resolution methylomes. It collects DNA and RNA methylation data and provides an interactive browser for visualization of high-resolution methylation data. It features incorporation of data from human, animals and plants and integration of a variety of methylation types. MethBank is the first database covering both DNA and RNA methylation profiles. Content Functional Modules 5mrC Human Precision Medicine 4,769 samples 2 species Data Pipeline Standards Submission Cloud platform tools Experts curation User submission Tools Knowledge Base Expression Evolution Transcriptional Regulation 5hmrC 6mdA 5mdC 3 sub-databases Animals Embryonic Development Plants Crops 18 methylomes 114,787 DMPs 38,205 methylated CpGIs 142,665 DMRs 5 species 72 methylomes 152,790 DMPs 92,642 methylated CpGIs Features RNA-Seq-based expression profiles for important species Standardized pipelines to estimate gene expression levels Cloud platform for analyzing raw expression data tools on homolog identification and co-expression network construction Knowledge mining for gene regulation Features Cover both DNA and RNA methylation data Include three major methylation types: 5mC, 5hmC, and 6mA Incorporate high-quality methylation profiles for multiple species Provide visualizations for whole-genome single-base resolution methylomes Offer online tools for whole-genome bisulfite sequencing data analysis Team Members Team Members Lili Hao (TL) haolili@big.ac.cn System Design & Data Lin Xia xialin@big.ac.cn Xin Sheng shengxin@big.ac.cn Web Development Rujiao Li (TL) lirj@big.ac.cn System Design & Data Fang Liang liangf@big.ac.cn Dong Zou zoud@big.ac.cn Database Development 14 15

11 Databases Science Wikis Databases Chinese Reference Genome Introduction The Science Wikis ( are a series of biological databases wikified for community curation, among which LncRNAWiki and RiceWiki are two featured resources that exploit the full potential of worldwide scientific communities for big data collection, integration and management. Content RiceWiki: A Wiki-based Database for Community Curation of Rice Genes WikiCell: A Unified Resource Platform for Human Transcriptomics Research LncRNAWiki: A Wiki-based Platform for Community Curation of Human Long Non-coding RNAs ESND: A Wiki-based English-to-Chinese Scientific Nomenclature Dictionary Introduction BIG leads the CAS Precision Medicine Cohort Research Program and participates in the National Key Research and Development Project on Precision Medicine, accordingly producing large-scale whole genome deep sequencing data for Chinese population. To support the precision health studies, it is fundamentally critical to perform genetic variation analysis on population sequencing data, to build a high-precision genetic variation map and develop the Chinese Reference Genome Database. Content Curation Modules Community Curators Curation training Quality control Curate Database Reward Post-publication requirement Effort validation Journals TCAACGTTACG Whole genome sequencing and variation analysis of ~400 Chinese individuals Integrated analysis of multi-level genetic variations, including SNPs, Indels, and SVs Construction of high-precision genetic variation map for the Chinese population Upgrade of Virtual Chinese Genome Database (VCGDB) based on genome variations Team Members Features Bulid a standard pipeline for large-scale population genetic variation analysis Construct the Reference Panel for genotype imputation Establish genetic variation map for Chinese population Lina Ma (TL) malina@big.ac.cn System Design & Data Lin Liu liul@big.ac.cn Dong Zou zoud@big.ac.cn Database Development Team Members Jian Sang sangj@big.ac.cn Zhenglin Du (TL) duzhl@big.ac.cn Population Genome Reference Na Yuan yuann@big.ac.cn Jingyao Zeng zengjy@big.ac.cn 16 17

12 Conferences & Visitors Training Conferences Genomics & Bioinformatics Training The Genomics & Bioinformatics Training (GBT) is organized by Beijing Institutes of Genomics, Chinese Academy of Sciences. GBT features involving a number of senior and experienced scientists as trainers and providing genomics & bioinformatics courses for researchers and biomedical professionals at postgraduate level and above. Since the first GBT in 2008, more than 830 persons have been participated in this training. Content Young Bioinformatics PI Workshop, August 12-13, 2016, Beijing Principle and applications of the second and third generation sequencing technologies Large-scale genome and transcriptome sequencing Bioinformatics algorithms and databases RNA-Seq, ChIP-Seq, BS-Seq data analysis Linux-based operations Big Data Forum for Life and Health Sciences, December 5-8, 2016, Beijing Visitors March 2016, Dr. Ewan Birney and Dr. Rolf Apweiler from EBI March 2016, Dr. Olivier LE Gall from INRA 18 May 2016, Dr. Enge Wang (Vice President of CAS) 19

13 Hardware Resources Publications Introduction 1. With the rapid development of next-generation sequencing technologies, omics data are generated at increasingly explosive scales and rates, which makes biological research enter into the big data era. The BIG Data Center, supported by CAS Instrument Sharing Management Platform, aims to establish high performance computing environment and frequently promote infrastructure capabilities, currently with 1.6Gbps international network bandwidth, 100 TFlops (Floating-point operations per second) computing resources and nearly 10 PB storage capacity. It also provides data computing, storage and sharing services, serving 16 CAS institutions and covering a total of 74 research groups, with an average of more than 500 active users online and more than 3000 tasks running per day. 2. GSA: Genome Sequence Archive. Genomics Proteomics Bioinformatics. 15, (2017). 3. RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res. 45, 4. MTD: a mammalian transcriptomic database to explore gene expression and regulation. Brief The BIG Data Center: from deposition to integration to translation. Nucleic Acids Res. 45, D18-D24 (2017). D128-D134 (2017). Bioinform. 18, (2017). 5. GAAP: Genome-organization-framework-Assisted Assembly Pipeline for the Prokaryotic Genome. 6. Transcriptome Reveals Dynamic Changes in Coxsackievirus A16 Infected HEK 293T Cells. BMC genomics. 18, 952 (2017). BMC genomics. 18, 933 (2017). 7. CloudPhylo: a fast and scalable tool for phylogeny reconstruction. Bioinformatics. 33: (2017). 8. Genomic analysis of snub-nosed monkeys (Rhinopithecus) identifies genes and processes related to high-altitude adaptation. Nat. Genet. 48, (2016). 9. Integrated analysis of phoneme, genome, and transcriptome of hybrid rice uncovered multiple heterosis-related loci for yield increase. Proc Natl Acad Sci USA. 113, (2016). 10. Information Commons for Rice (IC4R). Nucleic Acids Res. 44, D1172-D1180 (2016). 11. Precision methylome characterization of Mycobacterium tuberculosis complex (MTBC) using PacBio single-molecule real-time (SMRT) technology. Nucleic Acids Res. 44, (2016). 12. SorGSD: a sorghum genome SNP database. Biotechnol Biofuels. 9, 6. doi: /s (2016). 13. Characterization of spectinomycin resistance in Streptococcus suis: two novel insights into the drug resistance formation and dissemination mechanism. Antimicrob. Agents Chemother. 60, (2016). 14. Pangenome evidence for higher codon usage bias and stranger translational selection in core genes of Esherichia coli. Front Microbiol. 7, 1180 (2016). 15. Comparative genomics analysis of Streptomyces species reveals their adaptation to the marine environment and their diversity at the genomic level. Front Microbiol. 7, 998 (2016). Team Members 16. What signatures dominantly associate with gene age? Genome Biol Evol. 8, (2016). 17. Randomness in sequence evolution increases over time. PLoS One. 11, e (2016). 18. Old genes experience stronger translational selection than young genes. Gene. 590, (2016). 19. Complete genome sequence and comparative genome analysis of a new special Yersinia enterocolitica. Arch Microbiol. 198, (2016). 20. Huanxin Chen (TL) chenhx@big.ac.cn System Administration Yunbin Sun sunyb@big.ac.cn Computing Cluster Administration Lei Yu yul@big.ac.cn Computing Service and Administration BS-RNA: An efficient mapping and annotation tool for RNA bisulfite sequencing data. Comput Biol Chem. 65, (2016). 21. Precision Medicine: What Challenges Are We Facing? Genomics Proteomics Bioinformatics. 14, (2016). 22. Constructing the international database management system for omics big data. Big Data Res. 2, 4352 (2016). Mingyuan Sun sunmy@big.ac.cn Computing Network Administration Shuang Zhai zhaish@big.ac.cn High-Performance Computing & Storage 20 21

14 Funding The BIG Data Center, as part of Beijing Institute of Genomics, Chinese Academy of Sciences (CAS), is funded primarily by the government and CAS. It is supported by grants from: National Programs for High Technology Research and Development National Natural Science Foundation of China National Key Research Program of China Strategic Priority Research Program of the Chinese Academy of Sciences International Partnership Program of the Chinese Academy of Sciences Key Program of the Chinese Academy of Sciences 22 Image Courtesy: Prof. Yungui Yang