Regional Collaborations over Distributed Cloud

Size: px
Start display at page:

Download "Regional Collaborations over Distributed Cloud"

Transcription

1 Regional Collaborations over Distributed Cloud Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan APAN46 Auckland, New Zealand

2 e-science Applications Supported by ASGC Proton Therapy CryoEM Structural Biology Disaster Mitigation Soundscape Monitoring Astronomy ML Applications

3 e-science Applications Supported by ASGC Proton Therapy CryoEM Structural Biology Disaster Mitigation Soundscape Monitoring Astronomy ML Applications Security

4 Supporting Multi-Disciplinary e-science Applications & Moving towards Open Science with Enhanced Data- Oriented & Big Data Analysis Capabilities over the Common e-infrastructure in Academia Sinica 3

5 CPU: 24K Cores GPU: 260K Cores Disk: 20PB Monitoring the power consumption and temperature of every piece of equipment every 10 seconds.

6 Support Big Data Processing in Particle Physics 1.6M CPU-Days and 15+PB data production in ,600, ,522, ,200, CPU-Days 800, , , , , , , Petabyte (PB) 400, , , , ,717 47,218 62,813 50, (~July) Data (In+Out accum.) (PB) AMS ATLAS 0 5

7 Distributed Cloud Operating System (DiCOS) Open source middleware developed by ASGC for geographically distributed data commons to share resources

8 7

9 25% of computing resource were shared by this model 7

10 Optimized WMS for Harvesting Heterogeneous Resources and Integrating Containers Server-Harvester-Pilot model Coherent machinery for pilot provisioning on all computing resources Timely optimization of CPU allocation among various resource types Container could be launched via python API and run pilot wrapper immediately 8

11 GPU Resources Integration Taking advantage of container, we make our computing platform more flexible for different computing requirements Dynamic optimization of payload size CPU/GPU resource allocation for MPI jobs of various requirements Support various workflows, such as job/event-level latebinding and jumbo jobs 9 DDM PanDA Stage-out output Fetch job Update status Stage-in input Shared FS (GlusterFS) Job Harvester Job API GPU GPU GPU GTX 1080Ti, 260,000 cores

12 Analysis/Application As a Service Integrating DiCOSBox, Jupyter, Container and Analysis Tools. Analysis could be reproducible, easy sharing, carried out only with Web browser, and integrated with other analysis toolkits 10

13 Jupyter Notebook on GPU Using jupyter notebook running jobs on GPU by DiCOS Web-UI Machine learning/deep learning environment such as Keras, Tensorflow are provided 11

14 Future Works: Open Science & Open Data Reproduce works on Web browser by accessing file directly from DiCOSBox (Cloud) Easier sharing & reuse via web interface Automatic and scheduled syncing by client Office Online Integration: Native Office in the browser

15 AAI Infrastructure 13

16 The AARC Blueprint Architecture (BPA) is a set of software building blocks that can be used to implement federated access management solutions for international research collaborations. The Blueprint Architecture lets software architects and technical decision makers to mix and match tried and tested components to build customized solutions for their requirements. Guidelines and supporting documents reference architecture conventions and community standards best policy practices implementation hints training for FIM communities 14

17 IdP/SP Proxy 15/06/

18 Examples 16

19 DiCOS Supports Various Pipelines of Bioinformatics Analysis by Scalable & High-Throughput Distributed Infrastructure Multiple Data Sources Fastq GEUVADIS Data Registered with ARK IDs & Metadata and shared by Distributed Data Management (DDM) of DiCOS BAM RNA-Seq Data Preprocess Pipeline (STAR, Picard, Cufflink) 1000Genome BAM VCF Variant Discovery Pipeline (GATK) BAM, VCF Analysis & Evaluation Pipelines WebUI CLI API Distributed Job Management (DJM) of DiCOS Remote resources utilized for data processing and analysis

20 Architecture for Cryo-EM Applications 100Gb Switch Campus Network 100G MLXe8 ASGC 100G 100G DTN DTN FDR/IB Switch 10Gb/E Switch FDR/IB Switch File Server Web Server Storage Server GPU Farm Buffer Server DB Server HTC Farm Local GPU Cryo-EM Site 跨領域 大樓樓 TEM Processing Server Roles Hot Backup Sites: All kinds of data, from raw data to results Batch Data Analysis and Processing: by RELION over the GPU farm Data Access/Delivery Services System Efficiency Improvement Resource Federation DTN=Data Transfer Node; IB = InfiniBand

21 WebUI of DiCOS for CryoEM Analysis Workflow by RELION and CryoSPARK 19

22 APAN Serves as Regional Collaboration Platform Sentinel Asia (JAXA) DMCC+ Open Science & Infrastructure, Services JP, IN, CZ,DE Haze/Smoke Monitoring IR, PK UNESCO KR, LK, NP and All Asian Countries APAN (DMWG) APAN: Enabling adv. R&E applications by networking & collaborations (NREN, Community, Application) ID, MY, PH, TH, VN, TW 20 BD, MM UND Leverage e- Infrastructure for public services by TEIN network

23 Open Science Platform of DMCC+ 21

24 Animation of Tsunami Wave Propagation Simulation of 2004 Indian Ocean Tsunami in Banda Aceh by icomcot 22

25 Animation of Tsunami Wave Propagation Simulation of 2004 Indian Ocean Tsunami in Banda Aceh by icomcot 22

26 Building Asia Regional Soundscape Monitoring Network 23 Operational since Oct 29, M+ sound files (33K+ hours, 100TB+) from 18 sites in 4 countries New sites: VN, TH, MY, PH, and LA, KH ASEAN Center for Biodiversity (ACB) is also a collaborator

27 Biodiversity Monitoring & Dynamics Enhanced by Multi-Layered Non-Negative Matrix Factorization (MLNMF) Blind Source Separation (BSS) Separating soundscape components from long=duration recordings Separating different species of animal vocalizations Searching target signals from noisy recordings 24

28 System Architecture Soundscape Web Portal soundscape.twgrid.org Metadata Services (CKAN + In-house code) Ceph Storage (Scalable Storage) (2 copies) CPU Intensive Processes Will be taken care by DiCOS Data Analysis & Conversion Batch Processing by DiCOS (Distributed Cloud) with O(10K) cores, PB storage Resource Federation Sensors Local Data Store (Will need a reference model) Local Cloud Resources More Partner Sites Partner Site 25

29 Summary Research infrastructure will evolve progressively by interactions with big data applications Enhancement of Efficiency by Big Data Analysis Simpler and Secure SSO, Leveraging IdFed & AAI Science Gateway is developed for customized workflow over the shared infrastructure & DiCOS Support open data, open technology, and open collaborations 26