COPYRIGHTED MATERIAL. Contents. Part One Requirements, Realities, and Architecture 1. Acknowledgments Introduction

Size: px
Start display at page:

Download "COPYRIGHTED MATERIAL. Contents. Part One Requirements, Realities, and Architecture 1. Acknowledgments Introduction"

Transcription

1 Contents Contents ix Foreword xix Preface xxi Acknowledgments xxiii Introduction xxv Part One Requirements, Realities, and Architecture 1 Chapter 1 Defining Business Requirements 3 The Most Important Determinant of Long-Term Success 4 Uncovering Business Value 5 Obtaining Sponsorship 6 Defining Enterprise-Level Business Requirements 7 The Prioritization Process 16 Revisiting Project Planning 19 Gathering Project Requirements 19 Business Requirements Example: Adventure Works Cycles 21 Interview Preparation at Adventure Works Cycles 22 Adventure Works Cycles Enterprise Business Requirements 25 Analytic Themes and Business Processes 31 Adventure Works Cycles Bus Matrix 34 The Adventure Works Cycles Prioritization Process 35 Business Requirements for the Orders Project 36 Summary 36 COPYRIGHTED MATERIAL ix

2 x Contents Chapter 2 Designing the Business Process Dimensional Model 39 Dimensional Modeling Concepts and Terminology 40 Facts 41 Dimensions 43 Bringing Facts and Dimensions Together 45 The Bus Matrix, Conformed Dimensions, and Drill Across 46 Additional Design Concepts and Techniques 49 Surrogate Keys 49 Slowly Changing Dimensions 51 Dates 56 Degenerate Dimensions 57 Snowflaking 58 Many-to-Many or Multivalued Dimensions 59 Hierarchies 62 Heterogeneous Products 65 Aggregate Dimensions 66 Junk Dimensions 68 The Three Fact Table Types 70 Aggregates 71 The Dimensional Modeling Process 72 Preparation 74 Data Profiling and Research 80 Building Dimensional Models 84 Developing the Detailed Dimensional Model 90 Testing the Model 91 Reviewing and Validating the Model 91 Case Study: Creating the Adventure Works Cycles Orders Dimensional Model 92 Choosing the Dimensions 92 Identifying Dimension and Fact Attributes for the Orders Business Process 97 The Final Draft of the Initial Model 100 The Issues List 100 Detailed Dimensional Model Development 101 Final Dimensional Model 104 Summary 104 Chapter 3 The Toolset 107 The Microsoft DW/BI Toolset 109 Why Use the Microsoft Toolset? 110 Architecture of a Microsoft DW/BI System 111 Why Analysis Services? 113 Arguments Against Analysis Services 115 Why a Relational Store? 115 Overview of the Microsoft Tools 117 Which Products Do You Need? 117 SQL Server 2005 Development and Management Tools 120 Summary 125

3 Contents xi Part Two Developing and Populating the Databases 127 Chapter 4 Setup and Physical Design 129 System Sizing Considerations 131 Calculating Data Volumes 131 Determining Usage Complexity 132 Estimating Simultaneous Users 135 Assessing System Availability Requirements 135 System Configuration Considerations 136 How Much Memory? 136 Monolithic or Distributed? 136 What Kind of Storage System? 139 Processors 141 Setting Up for High Availability 141 Software Installation and Configuration 142 Development Environment Software Requirements 143 Test and Production Software Requirements 148 Operating Systems 149 SQL Server Relational Database Setup 149 Analysis Services Setup 153 Integration Services Setup 155 Reporting Services Setup 156 Physical Data Warehouse Database Design 157 Surrogate Keys 158 String Columns 159 To Null, or Not to Null? 159 Insert an Unknown Member Row 160 Table and Column Extended Properties 160 Housekeeping Columns 161 Indexing and Column Constraints 162 Create Table Views 165 Partitioned Fact Tables 165 Aggregate Tables 173 Staging Tables 174 Metadata Setup 175 Summary 175 Chapter 5 Designing the ETL System 177 An Introduction to SQL Server Integration Services 178 Overview of BI Studio Integration Services Tool 179 Control Flow 181 Data Flow 184 Concepts for Dynamic Packages 191 Event Handlers 194 High-Level Planning 194 Develop the First Draft High-Level Map 195 Build a Sandbox Source System 197

4 xii Contents Perform Data Profiling 199 Complete the Source-to-Target Mapping 200 Load Frequency 201 How Much History? 201 Using Partitions 202 Historical and Incremental Loads 204 Develop Strategies for Extracting Data 205 De-Duplication of Person and Organization 211 Develop a Strategy for Dimension Distribution 213 Updating Analysis Services Databases 214 ETL System Physical Design 214 System Architecture and Integration Services 215 Staging Area 215 Package Storage 216 Package Naming Conventions 217 Developing a Detailed Specification 218 Summary 219 Chapter 6 Developing the ETL System 221 Getting Started 223 Create Solution, Project, and Data Sources 223 Package Template 224 Master Packages and Child Packages 227 Dimension Processing 228 Dimension Processing Basics 229 Extract Changed Rows 237 Slowly Changing Dimensions 239 De-Duplication and the Fuzzy Transforms 250 Fact Processing 251 Extracting Fact Data 252 Extracting Fact Updates and Deletes 255 Cleaning Fact Data 256 Checking Data Quality and Halting Package Execution 257 Transforming Fact Data 264 Surrogate Key Pipeline 267 Loading Fact Data 271 Analysis Services Processing 277 Tying It All Together 278 The Audit System 278 The Master Package 282 Package Event Handling 285 Unit Testing 286 Summary 286 Chapter 7 Designing the Analysis Services OLAP Database 289 Why Analysis Services? 290 Aggregation Management 290 Aggregation Navigation 291

5 Contents xiii Summarization Logic for Each Measure 292 Query Performance 292 Calculations 293 Other Reasons for Using Analysis Services 293 Why Not Analysis Services? 295 Designing the OLAP Structure 296 Getting Started 297 Create a Project and a Data Source View 300 Dimension Designs 303 Creating and Editing Dimensions 307 Creating and Editing the Cube 321 Physical Design Considerations 337 Storage Mode: MOLAP, HOLAP, ROLAP 339 Designing Aggregations 340 Partitioning Plan 342 Planning for Deployment 344 Historical Processing Plan 345 Incremental Processing Plan 345 Summary 349 Part Three Developing the BI Applications 351 Chapter 8 Business Intelligence Applications 353 Business Intelligence Basic Concepts 354 Standard Reports 355 Analytic Applications 355 BI Application Developers 357 The Value of Business Intelligence Applications 358 Delivery Platform Options 360 The BI Application Development Process 361 Application Specification 362 Application Development 371 Maintenance 375 Summary 376 Chapter 9 Building the BI Application in Reporting Services 379 A High-Level Architecture for Reporting 381 Reviewing Business Requirements for Reporting 381 Examining the Reporting Services Architecture 384 Using Reporting Services as a Standard Reporting Tool 387 Reporting Services Assessment 393 Building and Delivering Reports 394 Planning and Preparation 395 Creating Reports 397 The BI Portal 410 Reporting Operations 414 Summary 417

6 xiv Contents Chapter 10 Incorporating Data Mining 419 Defining Data Mining 420 Basic Data Mining Terminology 422 Business Uses of Data Mining 423 Roles and Responsibilities 429 SQL Server Data Mining Architecture Overview 431 The Data Mining Design Environment 431 Build, Deploy, and Process 432 Accessing the Mining Models 432 Integration Services and Data Mining 434 Additional Features 434 Architecture Summary 435 Microsoft Data Mining Algorithms 435 Decision Trees 436 Naïve Bayes 438 Clustering 438 Sequence Clustering 439 Time Series 439 Association 439 Neural Network 440 The Data Mining Process 440 The Business Phase 441 The Data Mining Phase 443 The Operations Phase 450 Metadata 451 Data Mining Examples 452 Case Study: Classifying Cities 452 Case Study: Product Recommendations 462 Summary 479 Part Four Deploying and Managing the DW/BI System 481 Chapter 11 Working with an Existing Data Warehouse 483 The Current State of Affairs 484 Data Quality 484 Mart Madness 485 Business Acceptance Disorder 486 Infrastructure Disorder 487 Political and Organizational Problems 487 Perfect Health 488 Conversion from SQL Server Relational Data Warehouse 489 Integration Services 490 Analysis Services 490 Reporting Services 491 Data Mining 491

7 Contents xv Integrating with Non-SQL Server 2005 Components 491 Replacing the Relational Database 492 Replacing Integration Services 494 Replacing Analysis Services OLAP, Data Mining, or Reporting Services 494 Using a Non-Microsoft Ad Hoc Query Tool 495 Summary 497 Chapter 12 Security 499 Identifying the Security Manager 501 Securing the Hardware 501 Securing the Operating System 502 Securing the Development Environment 503 Securing the Data 504 Providing Open Access for Internal Users 504 Itemizing Sensitive Data 506 Securing Various Types of Data Access 506 What Should You Do? 513 Windows Integrated Security 514 Analysis Services Security 515 Relational DW Security 525 Reporting Services Security 533 Integration Services Security 537 Usage Monitoring 537 Protecting Privacy 538 Summary 539 Chapter 13 Metadata Plan 541 Metadata Basics 542 The Purpose of Metadata 542 The Metadata Repository 544 Metadata Standards 545 SQL Server 2005 Metadata 548 Cross-Tool Components 553 Relational Engine Metadata 554 Analysis Services 554 Integration Services 555 Reporting Services 556 External Metadata Sources 556 Looking Forward on SQL Server Metadata 558 A Practical Metadata Approach 558 Creating the Metadata Strategy 558 Business Metadata Reporting 561 Process Metadata Reporting 569 Technical Metadata Reporting 570 Ongoing Metadata Management 570 Summary 570

8 xvi Contents Chapter 14 Deployment 573 System Deployment 574 Pre-Deployment Testing 575 Deployment 591 Data Warehouse and BI Documentation 600 Core Descriptions 601 Additional Documentation 602 Additional Functions 603 User Training 604 Training Development 604 Training Delivery 608 User Support 609 Desktop Readiness and Configuration 611 Summary 612 Chapter 15 Operations and Maintenance 615 Providing User Support 617 Maintaining the BI Portal 617 Extending the BI Applications 618 System Management 619 Executing the ETL Packages 620 Monitoring the Business Intelligence System 623 Managing Disk Space 634 Killing Queries 636 Service and Availability Management 637 Performance Tuning the DW/BI System 638 Performance Tuning Analysis Services 639 Managing Partitioning 642 Backup and Recovery 648 Summary 654 Part Five Extending the DW/BI System 657 Chapter 16 ManagingGrowth 659 Lifecycle Iteration: Growing the DW/BI System 660 Business Requirements and Project Management 661 The Technology Track 661 The Data Track 661 The Applications Track 662 Deployment, Maintenance, and Growth 662 Marketing and Expectation Management 663 The Stakeholders 663 Quantitative Techniques 664 Qualitative Techniques 665 System Interconnection 670 Downstream Systems 671 Master Data 671 BI Web Services 673 Summary 673

9 Contents xvii Chapter 17 Real-Time Business Intelligence 675 Making the Case For (and Against) Real-Time Data 676 What Makes Delivering Real-Time Data Hard? 676 What Makes Real-Time Data Valuable? 678 What Should You Do? 678 Executing Reports in Real Time 682 Serving Reports from a Cache 682 Sourcing a Report from an Integration Services Package 683 Loading the DW/BI System in Real Time 686 The Integrated Approach 686 The Real-Time Layer 687 Using Analysis Services to Deliver Real-Time Data 690 Building Cubes from Normalized Data 691 Proactive Caching 691 Using Integration Services with Analysis Services in Real Time 704 Summary 705 Chapter 18 Present Imperatives and Future Outlook 707 The Big Risks in a DW/BI Project 707 Phase I Requirements, Realities, Plans, and Designs 708 Phase II Developing the Databases and Applications 708 Phase III Deploying and Managing the DW/BI System 709 Phase IV Extending the DW/BI System 710 What We Like in the Microsoft BI Toolset 710 Future Directions: Room for Improvement 711 Query Tools 711 Metadata 712 Relational Database Engine 712 Analysis Services 713 Analytic Applications 713 Integration 713 Conclusion 714 Index 715

10