Azure Data Factory V2 / SSIS. Christian Cote

Size: px
Start display at page:

Download "Azure Data Factory V2 / SSIS. Christian Cote"

Transcription

1 Azure Data Factory V2 / SSIS Christian Cote

2 Whoami ETL (extract, transform and load) Tech Leader ETL development using various ETL tools: DTS / SSIS, Hummungbird Genio, Informatica, Datastage DW Experience in various domains: Pharmaceutical, finance, insurance, manufacturing and education. Specialized in Data warehouse/bi/big Data Writer of several books on data integration Microsoft Data Platform Most Valuable Professional (MVP) Montreal Data Platform Pass chapter leader

3 Life isn t about waiting for the storm to pass It s about learning to dance in the rain.

4

5

6

7

8 8 2 Real-time data 1 Increasing data volumes New data sources & types Data sources Non-Relational Data 3 4 Cloud-born data

9

10 EVOLVING APPROACHES TO ANALYTICS Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Dashboards Apps

11 EVOLVING APPROACHES TO ANALYTICS Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Dashboards Ingest (EL) Original Data Apps

12 EVOLVING APPROACHES TO ANALYTICS Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Dashboards Apps Streaming data Transform & Load

13 EVOLVING APPROACHES TO ANALYTICS Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Dashboards Apps Streaming data Transform & Load

14

15

16 Target Scenarios Lift Existing SSIS+SQL to the Cloud: Moving infrastructure to cloud (no longer maintain their own data centers) Run existing SSIS packages in a managed cloud env. Modern Data Warehouse For Enterprise Analytics : modernizing from traditional DW to reduce cost & scale to the variety/volume of big data For Data-Driven SaaS Apps : ISV building their app out of IM PaaS building blocks Data Tier App Tier

17 New Pipeline Model Rich pipeline orchestration Triggers ondemand, schedule, event Data Movement as a Service Cloud, Hybrid 67+ connectors provided SSIS Package Execution In a managed cloud environment Use familiar tools, SSMS & SSDT Author & Monitor Programmability (Python,.NET, Powershell, etc) Visual Tools

18 Demo Differences between ADF V1 VS ADF V2

19 Scalable Per job elasticity Up to 1 GB/s Data Movement Simple Visually author or via code (Python,.Net, etc) Serverless, no infrastructure to manage Access all your data 60+ connectors provided and growing (cloud, on premises, SaaS) Data Movement as a Service: 17 points of presence world wide Self-hostable Integration Runtime for hybrid movement

20 Integration Runtime for SSIS Managed Cloud Environment Pick # nodes & node size Resizable SQL Standard Edition, Enterprise coming soon Compatible Same SSIS runtime across Windows, Linux, Azure Cloud SSIS + SQL Server SQL Managed instance + SSIS (in ADFv2) Access on premises data via VNet Get Started Hourly pricing (no SQL Server license required) Use existing license (coming soon)

21

22

23 Pipeline SSIS Package

24 Demo SSIS Integration Runtime

25

26 Area Added Since Public Prev. Targeted by GA Activities Custom Activity (exec process) Azure Data Bricks activity Support Ent. Secure Package in HDI Addition srcs for GetMetadata and Lookup activity Orchestration Iteration: foreach, do-until Conditional: if-else Triggers Tumbling Window (backfill, stateful rerun) File Arrival in Blob Schedule Roll-Up (enable v1 scenarios) Data Movement Tools Connectors: +31: SAP Cloud for Customer, Informix, Google Big Query, ServiceNow, Marketo, Az. MySQL, etc. Monitor: ingestion progress & performance Regions: +West US2 (now 19 regions total) Visual control flow, data movement and monitoring Connectors: +3: SAP ECC, Netezza, Vertica Throughput: 10x increase for CosmosDB ingest

27 Area Added Since Public Prev. Targeted by GA VNet License Type Extensions Region Support Nodes Sizes Pricing ARM VNet support Enterprise edition (private preview) Custom setup interface (private preview) Australia East, Central US More options w/ more memory Add D series v3 Add E series v3 Azure Hybrid Use Benefit (aka. BYOL)

28 Area Added Since Public Prev. Targeted by GA Authentication & Key Mgmt v1 to v2 Migration Tool MSI auth for ADLS AAD auth SQL DB/DW Linked Services credentials integrated w/ AKV Migrate as-is Migrate based on patterns/heuristics (1000 table copies -> 1000 activities, model as a loop) From JSON in folder and from an ADF instance Area Added Since Public Prev. Targeted by GA Self-Hosted Integration Runtime Cloud Visual creation/mgmt Visual creation/mgmt - Instances shareable across multiple data factories - MySQL/Postgres drivers bundled

29

30

31 Azure Data Factory has enabled us to integrate heterogenous data from multiple hospitals allowing us to leverage big data and analytics offerings in Azure at scale to drive better health outcomes for our customers - David B. McAuley, CTO, Lumedx

32

33