Lustre Beyond HPC. Toward Novel Use Cases for Lustre? Presented at LUG /03/31. Robert Triendl, DataDirect Networks, Inc.

Size: px

Start display at page:

Download "Lustre Beyond HPC. Toward Novel Use Cases for Lustre? Presented at LUG /03/31. Robert Triendl, DataDirect Networks, Inc."

Randell Newton
5 years ago
Views:

1 Lustre Beyond HPC Toward Novel Use Cases for Lustre? Presented at LUG /03/31 Robert Triendl, DataDirect Networks, Inc.

2 2 this is a vendor presentation!

3 3 Lustre Today The undisputed file-system-of-choice for HPC large-scale parallel I/O for very large HPC clusters (several thousand nodes or larger) applications that generate very large datasets but only a relatively limited amount of metadata traffic Alternative solutions are typically more expensive (since proprietary) and not nearly as scalable as Lustre

4 4 Lustre Market Overview Data for the Japan Market 100% 90% 80% 70% 60% 50% 40% 30% 20% Cloud & Content Business Data Analysis HPC "Work" Corporate Research Data Analysis HPC Archive HPC "Work" 10% 0%

5 5 What Happened? Example: Storage Market Japan Lustre market expansion From a relatively small player to the dominant HPC file system From tens of OSSs to hundreds of OSSs (and, when including the K system, literally thousands of OSSs) But, also Lustre has become synonymous with largescale HPC Experiments to use Lustre in other markets and for other applications have decreased, rather than increased

6 6 Overall Storage Market Evolution Software-defined storage is replacing traditional storage architectures in large cloud deployments Many Lustre alternatives are now available CEPH, OpenStack/Swift, Swift Stack, Gluster, etc. Various Hadoop distributions Commercial Object Storage (DDN WOS, Scality, Cleversafe, etc.) Lustre remains exotic and is rarely considered even an option

7 7 I/O Caching Connectors Client Performance NFS/CIFS Access Large I/O BA 6% Management Cluster Integration HPC Work, 24% HPC "Work" Corp 15% RAS Features Data Management Project Quota Backup/Replicatio n HPC Archive 23% Research Data Analysis 35% Small File I/O Object/Cloud Links SSD Acceleration HSM Fine-Grained Monitoring

8 8 I/O Caching Client Performance Connectors Large I/O BA 6% NFS/CIFS Access Cluster Integration HPC Work, 24% HPC "Work" Corp 15% Management Data Management Project Quota Backup/Replicatio n HPC Archive 23% Research Data Analysis 35% Small File I/O Object/Cloud Links SSD Acceleration HSM Fine-Grained Monitoring

9 9 Nagoya University Acceptance BM Large Cluster FFP from each core in the cluster Most efficient configuration with 3 TB/4 TB drives threads per Lustre OST

10 10 Nagoya University Initial Data FPP with Large Number of Threads GB/sec Write Performance Single OST (GB/sec) Processes per OST

11 11 Large I/O Patches Write: Lustre Backend Performance Degradation (Maximum for each dataset=100%) 7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC) Read : Lustre Backend Performance Degradation (Maximum for each dataset=100%) 7.2KSAS(1MB RPC) 7.2KSAS(4MB RPC) SSD(1MB RPC) Performance per OST as % of Peak Performance 120% 100% 80% 60% 40% 20% 0% Number of Process Performance per OST as % of Peak Performance 120% 100% 80% 60% 40% 20% 0% Number of Process

12 12 Raw Device Performance: Write sgpdd-survey (Hitachi 7.2K NL-SAS, RAID6, write) sgpdd-survey (SeagateES3 7.2 NL-SAS, RAID6, write) crg=1 crg=2 crg=4 crg=8 crg=16 crg=1 crg=2 crg=4 crg=8 crg=16 crg=32 crg=64 crg=128 crg=256 crg=32 crg=64 crg=128 crg= Throughput(MB/sec) Throughput(MB/sec) Total number of thread Total number of thread

13 13 Raw Device Performance: Read sgpdd-survey (Hitachi 7.2K NL-SAS, RAID6, read) sgpdd-survey (SeagateES3 7.2 NL-SAS, RAID6, read) crg=1 crg=2 crg=4 crg=8 crg=16 crg=1 crg=2 crg=4 crg=8 crg=16 crg=32 crg=64 crg=128 crg=256 crg=32 crg=64 crg=128 crg= Throughput(MB/sec) Throughput(MB/sec) Total number of thread Total number of thread

14 14 I/O Caching Client Performance Connectors Large I/O BA 6% NFS/CIFS Access Cluster Integration HPC Work, 24% HPC "Work" Corp 15% Management Data Management Project Quota Backup/Replicatio n HPC Archive 23% Research Data Analysis 35% Small File I/O Object/Cloud Links SSD Acceleration HSM Fine-Grained Monitoring

15 15 Output Data from a Simulation Requirements Random reads against multiple large files 2 million 4k random read IOPS Solution Lustre file system with 16 OSS servers Two SFA12K (or one SFA12KXi) 40 SSDs as Object Storage Devices

16 16 Lustre 4K Random Read IOPS Configuration: 4 OSSs, 10 SSDs, 16 clients 600, , ,000 Lustre 4K Random Read (FPP and SSF) FPP (POSIX) FPP (MPIIO) SSF (POSIX) SSF (MPIIO) 300, , , n32p 2n64p 4n128p 8n256p 12n384p 16n512p

17 17 Genomics Workflos Mixed workflows Ingest Sequencing pipelines: large file I/O Analytics workflows: mixed I/O Various I/O issues Random reads for reference data Small-file random reads

18 18 Random Reads with SSDs

19 19 Data Analysis Workflows Scientific data analysis Genomics workflows Seismic Data Analysis Various types of accelerators Large scientific instruments in astronomy Remote sensing and environmental monitoring Microscopy

20 20 Data Analysis Workflows Additional topics Data ingest Data management and data retention Data distribution and data sharing

Petabytes of Data Scratch data 100,000s cores Mostly Infiniband Data Analysis Workflows 10s of Billions of files WORM & WORN 10s of millions of

21 21 Hyperscale Storage HPC, Cloud, Data Analysis High Performance Computing Cloud Computing Mostly (very) large files (GBs) Small and medium size files (MBs) Mostly write I/O performance Mostly read I/O performance Mostly streaming performance Mostly transactional performance 10s of Petabytes of Data Scratch data 100,000s cores Mostly Infiniband Data Analysis Workflows 10s of Billions of files WORM & WORN 10s of millions of cores Almost exclusively Ethernet Single location Highly distributed data Very limited replication factor High replication factor High efficiency Low efficiency

22 22 I/O Caching Client Performance Connectors Large I/O BA 6% NFS/CIFS Access Cluster Integration HPC Work, 24% HPC "Work" Corp 15% Management Data Management Project Quota Backup/Replicatio n HPC Archive 23% Research Data Analysis 35% Small File I/O Object/Cloud Links SSD Acceleration HSM Fine-Grained Monitoring

23 23 A (DDN) Vision for Lustre Maximum sequential and transactional performance per storage sub-system CPU Caching at various layers within the data path Increased single node streaming and small file performance Millions of metadata operations in a single FS Millions of random (read) IOPS within a single FS

24 24 A (DDN) Vision for Lustre cont. Data management features, including (cloud) tiering, fast and efficient data back-up, and data lifecycle management Novel usability features such as cluster integration, QoS, directory-level quota, etc. Extremely high backend reliability for small and mid-sized systems

25 25 Futures for Lustre? Work Closely with Users User problems are the best source for future direction Translate user problems into roadmap priorities Work Closely with the Lustre Community Work very closely with OpenSFS and Intel HPDD on Lustre roadmap priorities and various other topics

26 26 Futures for Lustre?

27 27