Biobanks for biomedical research

Size: px
Start display at page:

Download "Biobanks for biomedical research"

Transcription

1 Paolo Romano

2 Outline Biobanks Biological materials for research Biobanking for Precision Medicine Biobank definition, activities, classification Information systems for biobanks Requirements Privacy protection BBMRI-ERIC infrastructure BBMRI-ERIC platform Biobanks missing information

3 Biobanks

4 Biological materials for research Biological materials are living organisms or their parts used in biological experiments. They must be properly collected, stored, characterized, maintained, distributed and used. Since 1800, collections of micro-organisms were created for research aims, mainly classification and characterization. Service collections were established to preserve and distribute quality strains. The first service collection was created in Prague (1880s). The collection of the Institute Pasteur was created in 1892.

5 Biological materials for research In the second half of 1900, the utilization of biological materials has become a standard methodology in cellular biology for: in vitro analysis of cell behaviours, both in normal condition and under stress. Immortalized cell lines, mainly created by virus infection of a single cell and adapted to grow continuously, became of common use in order to: have stable biological resources to be used as model, having selected characteristics and showing desired features, able to improve replicability and reproducibility of experiments, as an ethical alternative to in vivo studies on animals.

6 Biological materials for research Cell lines are highly specialized and present scarce biological variability. They are useful models, but do not represent the wide spectrum of genetic variability and possible behaviours of individuals. In this century, many omics sciences (genomics, proteomics, metabolomics, etc ) have been funded and related technologies and methods have been developed. Being based on molecular profiling, they: involve a better understanding of factors and processes related to the etiopathology of diseases enable new approaches to the health care based on individual molecular profiles.

7 Precision Medicine Precision medicine is an approach to disease treatment and prevention that seeks to maximize effectiveness by taking into account individual variability in genes, environment, and lifestyle (Precision Medicine Initiative Working Group, 2015). Genomic analyses support the definition of individual patterns of disease (improved diagnostics) and the assessment of the their susceptibility to treatments (improved prognostics and therapy). From: Kinkorová J. The EPMA Journal (2016) 7:4

8 Biobanking Research institutes and hospitals started collecting tissues from patients and healthy individuals in organized infrastructures, by taking into account individual rights, in order to make precision medicine possible. Biobanking has been identified as a key tool for: understanding the molecular basis of disease subtypes, clarifying the etiology of complex diseases, accelerating the development of new drugs and biomarker discovery. A more precise classification of disease may: accelerate the development of methods for the identification of targeted and effective treatments, avoid inappropriate treatments so reducing the incidence of undesired side effects of therapy, lead to new approaches to disease prevention and health promotion.

9 Biobanking Among advantages of biobanks are: Statistical Power Large number of samples collected over years Reduced costs Once collected, samples remain available for a number of studies; Standard operating procedures and best practice involve a more efficient and time-saving collection of quality assured samples; Centralised IT platform creates access for many users. Ethics Well controlled ethical processes strengthen trust. One main limitation: Contrary to cell lines, biobank resources are available in a limited number of aliquots and can only be used a few times.

10 Biobank definition Many definitions for a biobank have been proposed. An organized collection of human biological material and associated information stored for one or more research purposes Collections, repositories and distribution centres of all types of human biological samples, such as blood, tissues, cells or DNA and/or related data such as associated clinical and research data, as well as biomolecular resources, including model- and microorganisms that might contribute to the understanding of the physiology and diseases of humans Biobanks contain biological samples and associated information that are essential raw materials for the advancement of biotechnology, human health, and research and development in life science

11 Biobank definition All definitions consider relevant for a biobank: the proper conservation and distribution of research relevant biological materials, the management of associated data. This highlights the relevance of the information: materials alone are not sufficient. Legal issues are also considered in some definitions, mainly: informed consent of participants, safety and protection of individual sensitive data.

12 Biobank activities The processes of biobanking are quite complex and activities of a biobank are quite diverse. Informed consents are gathered from participants through appropriate procedures. This is often done by clinical staff. The actual collection of materials may also be carried out by external staff, e.g. by surgeons at the operative table. Materials usually are characterized for the diagnostic needs. For this reason, biobanks operate in tight connection with the anatomic pathology department. The samples are subdivided in aliquots and properly stored. Samples are distributed for research projects, following clearly defined protocols which also consider ethical issues. During these tasks, associated data management procedures must be carried out.

13 Biobanking guidelines Various guidelines and standard procedures have been defined: The OECD document on Recommendation on Human Biobanks and Genetic Research Databases (HBGRD) provides guidance for the establishment, governance, and management of human biobanks. IARC provides minimum technical standards and protocols for biobanks for cancer research. A list of guidelines, procedures and recommendations applicable to biobanks has been provided by Yuille et al.

14 Classification of biobanks Population-based biobanks Collection of samples from a large number of the general population Not disease specific Useful for epidemiological studies as well as prospective studies Disease-oriented biobanks Collections from a subset of the population with specific conditions Includes family collections Useful to study and assess the causes (both genetic and environmental) of the given condition

15 Classification of biobanks Special cases: clinical/control biobanks, which collect materials from patients affected by specific diseases and from healthy participants; longitudinal study biobanks, following a population over a large period of time; population isolate biobanks, with a homogenous genetic and environmental setup of a given population; twin biobanks, with samples from both mono- and dizygotic twins.

16 Classification of biobanks Other biobank characteristics: type of stored materials (likely, more to come): blood, both whole and components, plasma, serum, DNA, RNA, urine, cerebrospinal and synovial fluids, bone marrow stem cells, tissues (cryo- and paraffin preserved). intended use: research, diagnosis, forensics. ownerships (impacting both aims and organization): government, non-profit organizations, commercial companies.

17 Information systems for biobanks

18 Information systems for biobanks Biobank information systems must include: a description of the conserved biological materials, all data associated to the actual management of materials (including gathering, conservation and distribution): storage data and associated metadata, quality assurance data, up to date status and history of samples, information on sample requests and shipments, information on informed consents. information related to the participant, including clinical and epidemiological, information related to the characterization of the samples, such as those generated from their analytical analysis. In order to protect sensitive data and participant privacy, coding and/or anonymisation methods must be applied.

19 Information systems for biobanks Three categories of requirements: for the organization and management of biobanks, for the management of sample requests, including a user-friendly form for querying clinical annotations, a system to connect sample data and storage location, workflows for the management of requests of samples for special projects requiring approval by ethical and scientific committees. for the creation of clinical annotations, including tools to: validate data, exploit standardized classification systems and nomenclatures, perform data import and export in a flexible way. In order to harmonize information and requirements, a standard data model is required.

20 Information systems for biobanks MIABIS: Minimum Information About BIobank data Sharing ( MIABIS: reports a consensus list of the data required for the information system of a biobank, supports the reuse of data associated to biological materials by harmonizing their data models. enables the exchange of data on biological samples among biobanks.

21 Information systems for biobanks MIABIS is meant to connect all applications related to sample data. Merino-Martinez R et al. Biopreserv Biobank 2016;14:

22 Information systems for biobanks MIABIS has a modular structure. Core components refer to: biobanks, sample collections, studies. Components under development: samples, biological experiments, participants. A database schema exists which support the creation of a detailed data model. Merino-Martinez R et al. Biopreserv Biobank 2016;14:

23 Privacy protection The combination of experimental data with detailed participant data poses major data protection requirements, because the identification of the participant through some careful queries cannot be excluded. Hospital Patient Data DOB Sex Zipcode Disease 1/21/76 Male Heart Disease 4/13/86 Female Hepatitis 2/28/76 Male Brochitis 1/21/76 Male Broken Arm 4/13/86 Female Flu 2/28/76 Female Hang Nail Vote Registration Data Name DOB Sex Zipcode Andre 1/21/76 Male Beth 1/10/81 Female Carol 10/1/44 Female Dan 2/21/84 Male Ellen 4/19/72 Female Andre has heart disease! Ge Ruan. K-Anonymity and Other Cluster-Based Methods. 2007

24 Privacy protection The biobanking data protection scheme of the German Telematics Platform for Networked Medical Research (TMF) relies on: separation of informational powers, anonymisation or pseudonymization of sensitive data, adoption of trusted third party services for identity management. The separation of informational powers refers to the physical separation of: identity data (patient list), clinical data (clinical database), medical annotation data (research database), logistic information (sample database), analysis results data (analysis database).

25 Privacy protection Anonymisation aims to prevent the identification of the person to whom the data refer. It consists in the removal of any information that, alone or in a combination, may lead to the identification of the person. The anonymisation challenge is to create a version of the data set where: the persons cannot be identified, the data remain useful for the objective of the study. The grade of anonymisation of a data set may be assessed by computing its k-anonimity property. By definition, a data set has a k-anonimity property when any query on the data set cannot produce an output with less than k records. In other words, k-anonimity is achieved when the information on any given person cannot be distinguished from the information of at least k-1 other persons.

26 Privacy protection K-anonymity is usually achieved by suppression or generalization. Suppression consists in the substitution of the information with some meaningless fixed term or character, e.g. an '*'. The relative information is lost. Generalization consists in the substitution of actual values with some more general category. For example: the age can be substituted by a range (52 => 50-55), the town by the region or the country (Genova => Liguria). The relative information is reduced, but not lost.

27 Privacy protection K-anonimity may significantly reduce the informative contents of the data set and this reduces the significance of statistical analysis. A possible alternative is pseudonymisation where the data are replaced by fictitious identifiers, the pseudonyms. Pseudonyms must be: coherent (same pseudonym for a given data value), not encoded by some revertable algorithm/rule. The person becomes less identifiable, but: the data remains suitable for processing and statistical analysis, the pseudonym can be reverted back to its source data.

28 Biobank networking In general, the best experimental results can be reached by applying statistical methods to large data sets. A great number of samples, which is seldom available in a single biobank, is required, especially for studies on rare diseases. The co-ordination among biobanks is needed to gather a greater number of samples for the same experiment. Biobanks are then increasingly becoming networked. Biobank information systems are heterogeneous and integration poses many technological issues, including harmonization of data and integrated searches. Interoperability technologies can be exploited for the creation of data description frameworks and flexible data sharing services.

29 BBMRI-ERIC project Biobanking and BioMolecular Resources Research Infrastructure - European Research Infrastructure Consortium (BBMRI-ERIC) ( Funded by the EU / ESFRI, includes 19 European Member States and one International Organisation. It is a distributed infrastructure of biobanks and biomolecular resource centers providing access to: human biological samples, biomolecular research and services, biocomputational tools.

30 BBMRI-ERIC: mission BBMRI-ERIC mission includes: harmonization of procedures, implementation of common standards, association of data to samples in an efficient and ethically and legally compliant manner. It provides tools and expertise on: quality management for biobanks and research on biomolecular resources. information technologies for biobanks and for research on biomolecular resources (Common Service IT) ethical, legal and societal implications for the biobanking community (Common Service ELSI)

31 BBMRI-ERIC: achievements Current achievements includes: BBMRI-ERIC Directory, a tool to share aggregate information about involved biobanks (currently, 1,379), Minimum Information About BIobank data Sharing (MIABIS) standard, which describes the minimum information required to: initiate collaborations between biobanks enable the exchange of biological samples and data, BiobankApps, a catalogue of biobank-related software, where software developers and vendors can register their software, supporting userbased reviews.

32 BBMRI_ERIC: integrated platform An integrated platform where data is retrieved on demand from participating biobanks is under development. The architectural choices of the BBMRI-ERIC platform are: i) the components of the platform should be web applications communicating through REST / REST like Web Services; ii) iii) iv) all technologies used should be open source, and the software produced should be shared as open source; software should have stable releases in sync with stable interfaces for the overall stability of the platform; each biobank maintains its own proprietary IT structure, which is mapped to a shared data model; v) global interoperability is achieved via a two-level hub-and-spoke architecture, where hubs connect both among them and with many spokes.

33 BBMRI_ERIC: integrated platform Regarding the privacy protection, the following approach will be followed: in a central repository, sample data will only be made accessible grouped by collection, not as single sample; collections are defined as containers for sets of samples and may include sub-collections; properties of the samples in a collection are described in aggregated form, e.g. sample counts by disease and by material types; searching data at the non-aggregated sample level will be subject to data owner approval.

34 BBMRI_ERIC: integrated platform The architecture of the BBMRI-ERIC platform is based on four components: the Directory, the Sample Locator, the Negotiator, the Connector. A Metadata Repository (MDR) is also included for data harmonization in all communications between the various components. It contains the definition, description, validations and types of all data used to define samples and correlated information.

35 BBMRI_ERIC: integrated platform

36 BBMRI_ERIC: integrated platform The Directory stores data about biobanks and their collections in a central location. It enables to easily find and aggregate biobank information, thus enabling users to identify biobanks that may have samples of interest. Only summary level, anonymous information can be shared via the Directory.

37 BBMRI_ERIC: integrated platform The Sample Locator complement the Directory by answering requests on a sample-based level, though giving the biobanks full control over data requests. Non-aggregated sample data will be pseudonymised. The Sample Locator supports the heterogeneity of biobank data structures. Search queries are based on items from the MDR. Along with the query, requests may include a description clarifying the user need for the specific samples and the contact information of the user.

38 BBMRI_ERIC: integrated platform The Connector is the local interface of the biobanks. It: i) retrieves research inquiries from the Sample Locator, ii) queries local data sources, iii) enables data owners to manage requests by presenting them sample-level requests, together with associated information, for approval. The response process can be automated for trusted requesters.

39 BBMRI_ERIC: integrated platform The Negotiator is a communication platform for sharing sample-level data between researchers and biobanks. Considering the request from the Directory, a data sharing negotiation process is started. The Negotiator enables to refine the search requests for each biobank. In case of trusted requesters, the data is automatically shared.

40 Biobanks missing information Information and data owned by researchers using materials in their own experiments misses from biobank information systems. This information is partially published in literature. The majority of such data remains unknown to the community of researchers. A knowledge base, able to store and organize this collective information could be useful. Wiki systems have emerged as a platform able to stimulate users to contribute to the collaborative building of a common knowledge base. Specific aims of wikis for biology (bio-wikis) are collaborative: writing of papers and other documents, annotation of database contents, creation of database contents.

41 Biobanks missing information In Life Sciences, wiki systems are already available for the management of a variety of biological data and information. Gene Wiki ( Specialized subsection on human genes in Wikipedia Aims at building a high quality page for each human gene WikiGenes ( Able to manage and recognize contributors Encourage the collaborative creation of scientific papers Social features: users pages, rating by peers, WikiPathways ( Community annotation to complement some databases of metabolic pathways (KEGG, Reactome, Pathway Commons) Pathway editor available

42 Biobanks missing information Several issues still need to be addressed: i) reliability of user contributions, ii) desirable format for the annotations, iii) feedback of user provided information into databases. Textual information is only a small part of biological data which can be provided in many heterogeneous data formats, including images, plots, and diagrams. Special features are then required to retrieve, store, search and display biological data in wiki systems.

43 Biobanks missing information Wiki systems could gather data on samples and studies from experts, towards a knowledge repository aimed at: i) storage of information on current and past studies, research groups, involved diseases, ii) iii) iv) repository for data from experimental studies (e.g. gene expression, mass spectra, variations), and for extended clinical data (e.g., follow up data), cross-references to external databases (e.g., disease related: OMIM and Locus Specific Data Bases), social network features (e.g., peer review on study proposals, scores on quality of materials and studies).

44 Acknowledgements Biobanca Centro Risorse Biologiche Ospedale Policlinico San Martino Genova Dott.ssa Barbara PARODI Istituto Tecnologie Biomediche CNR Dott. Luciano MILANESI Prof. Vincenzo TAGLIASCO Prof. Giulio SANDINI

45 BioBanks Contacts Ing. Paolo Romano Bioinformatica Ospedale Policlinico San Martino Genova Web: