Information Economics Improve information economics, cut costs and reduce risks 2014 IBM Corporation
Enterprise big data dilemma Data capacity on average in enterprises is growing at 40 percent to 60 percent year over year due to a number of factors, including an explosion in unstructured data, such as email and documents that have to be stored due to regulatory requirements that continue to evolve and change. SOURCE: Computerworld, Data growth remains IT s biggest challenge, Gartner says, Lucas Mearian, November 2, 2010. 2014 IBM Corporation 2
Storage is cheap is no longer the answer Growth is out of control 120 100 80 60 40 20 Projected Growth Storage Volume 40%-60% Annual Growth Storage Costs are Consuming IT Budget Supply-Side initiates are only a temporary stop-gap Virtualization Over-Allocation 0 2011 2012 2013 2014 2015 2016 2017 2018 2019 2014 IBM Corporation 3
This growth represents an enormous challenge to IT organizations 1 Cost 2 Risk 3 Value $14 million $20 million annually Cost to store, administer and back up 1 PB of data 1 Highly classified information Personal information Cost Data to store, subject administer, backup to legal review 1PB Data Data center transformation Big data analytics Acquisitions and divestitures 2014 IBM Corporation 4
Excess information = higher cost and greater risk Dispose of unnecessary data = reduce cost and risk Legal hold: Collect and retain evidence 1% Regulatory record keeping: Hold and dispose of data 5% Business value: Archive records for value 25% 69% Information without duty and value = unnecessary data 2014 IBM Corporation 5
Sheila The best way to reduce the amount of data delete it. Childs, research VP, Gartner 2014 IBM Corporation 11
Where do we start Litigation & Legacy data clean up 2014 IBM Corporation
Data Volume Reduction to Lower Matter Costs Need to cut total information volume and related costs here ~4 PBs Over 2 billion discoverable pages Growing 27% every year (average) to systemically cut ediscovery processing and review volume and costs here. 2014 IBM Corporation 13
A court Vetted Solution With Numerous Client Wins Supported largest litigation case in world by identifying, collecting and analyzing 132 TB of data to produce 200GB of relevant data. PROBLEM: For The Deep Water Horizon matter, look across 132TB's, 3 continents and 8 locations. Collect 1TB to a preservation location in Houston. Full text indexing and apply additional terms to reduce to the smallest defensible data set which was sent out for production review by outside counsel. Final data set was approximately 200 GB's. SOLUTION: Enable a 100:1 reduction in collection process in less than 2 weeks. ROI: Saved million of dollars, responded to every DOJ request; substantially lowered outsourced review costs and built a defensible audit trial. 2014 IBM Corporation 14
Use Cases for Legacy Data Clean-up Clean-up ROT Data Redundant, Obsolete and Trivial data Boolean Search Remediate Regulated Data PII, PCI, HIPPA, HR, Financial, Records Pattern Match & Classify Secure High Business Value data IP, Pricing, Sales and Market, Patent, Planning Bayesian Classification 2014 IBM Corporation 15
Legacy data no longer has value but creates cost and risk Legacy Data Legacy data is: - Data that has aged past its usefulness - Data that is consuming COST without providing value to the organization - Data of low value that still carries RISK in legal actions 2014 IBM Corporation 16
What drives organizations to remediate legacy data? Mergers, Acquisitions & Divestitures Organizational change often requires the inspection and clean-up of legacy data Growth & Capacity Out of control growth and the cost associated with it are driving action in IT organizations Storage Migrations Remediating data before migration reduces cost, risk of project failure and regulatory exposure How do I consolidate useful data? Where is corporate intellectual property? Consolidate or Remove acquired data Secure Intellectual Property and confidential information Remediate out of policy File Types Remove data that has aged past its value Remove out of policy data types Remediate obsolete data by age Which data do employees actually use? Remediate legacy data to reduce data sizes Recover free storage capacity Migrate only the data that matters What is stored on my systems? Large Global Bank consolidated acquired bank data from 1000s of desktops in order to comply with regulatory mandate 2014 IBM Corporation 17
StoredIQ Solution Clean up legacy data 2014 IBM Corporation
The StoredIQ Platform and Solutions DASHBOARDS Use-Case Specific Dashboards (eg. ediscovery, Policy) Subject Matter Expert Use cases EDISCOVERY DATA CLEAN-UP COMPLIANCE RETENTION Data Expert DATAIQ Identify, Analyze & Act ACTIVE INTELLIGENCE IT Expert ADMINIQ P L A T F O R M BIG DATA Archive Platform ECM Forensic Images/Tapes File Servers Email Servers Desktops SharePoint & Enterprise Collaboration Cloud Media 2014 IBM Corporation 19
Connect to Data in its Native Location Support for 75+ data sources and 450+ file types 2014 IBM Corporation 20
Govern in Place Look at data wherever it lives across your organization Govern-in-Place Identify content that has value to the different stakeholders of your business and act on it appropriately Identify content that has negative value to your business and delete it (or move it) Archive Platform ECM Forensic Images/Tapes File Servers Email Servers Desktops SharePoint & Enterprise Collaboration Cloud Media 2014 IBM Corporation 21
StoredIQ Approach: Data about Data BIG DATA INTELLIGENCE 2014 IBM Corporation 22
Advanced visualizations show what types of data are stored across your enterprise 23 2014 IBM Corporation 23
Discover where your oldest or least used data resides 24 2014 IBM Corporation 24
Utilize intelligent overlays to spot potential issues 25 2014 IBM Corporation 25
Key benefits Find the data that matters Properly discover, classify and manage information according to business value to reduce risk and cost Get rid of old, obsolete data Delete nonbusiness, aged and obsolete data to reduce data volume Identify sensitive and toxic content Find misplaced client data, PCI and privacy-regulated data Stratify information to accelerate: Cloud migration Investigations Postacquisition and merger data integration Business process readiness Practice proactive audit, investigation or disclosure and discovery readiness 2014 IBM Corporation 26
ROT Clean Up Remove Redundant, Obsolete & Trivial Data 2014 IBM Corporation
Identify Data in-place, across multiple repositories AND take action Identify data in 75+ repositories Categorize data based on metadata and full-text Create rule sets for data action Act of data to secure or delete it 2014 IBM Corporation 28
Three-phase reduction of ROT data 1 2 3 Phase 1: Remove trivial data Data that never had any value to the organization Violates acceptable use policy Multimedia files (audio, video and images) Temporary files Phase 2: Move obsolete data for timed disposal Orphaned files (employee has left the company) Log files Files past their longest departmental retention or business value Phase 3: Remove departmental deduplication Create departmental master copy area (ensure that all department employees have access) Identify master copies and place in the correct area Deduplicate remaining share against the master area 2014 IBM Corporation 29
IBM can manage data in a proactive fashion, allowing companies to retain information according to corporate governance and regulatory mandates while disposing of unnecessary data with confidence 3 30% Problem 1 Risk and cost associated with stale and nonbusiness data in the organization Solution 2 StoredIQ, an IBM Company, identified non-business-related data and enabled the client to create custom classifications for automating file plans and implementing record-retention schedules Return on investment Reduced storage costs through routine data destruction, reduced risk associated with nonbusiness data and ensured overall litigation preparedness. Reduction in data footprint Large financial services organization 2014 IBM Corporation 30
The Business Case for ROT removal Align to trigger event or operational change to maximise returns ILG transformation provide ahead of ILG cycle a cleansing program to expedite quick win and ROI Data centre transformation Content moving to Cloud, Off Premise provider, New data centre or hardware migration Storage provision approaching capacity upgrade/expansion being reviewed IM strategy change cleanse wild content ahead of client move to new IM infrastructure such as SharePoint or other collaborative or ECM platform Provides multiple cost savings Clients with near storage capacity, new storage acquisition can be significantly deferred Clients embarking transformation, the data must be moved, thus reducing content footprint drives down; Volume of data to move Time to move data due to reduced volume Risk as less time to move data reduces business disruption and scope for error Cost, data movement charged on volume and time, less volume = less time equals compound saving, less volume require smaller new state provision equals less capital outlay 2014 IBM Corporation 31
Regulated Data Identify and remediate regulated data 2014 IBM Corporation
Identify regulated data in-place, across multiple repositories AND take action Identify data in 75+ repositories Categorize and Classify data based on Boolean search, patterns and machine learning Create rule sets for data action Act of data to secure, report, copy, move or delete 2014 IBM Corporation 33
Find and remediate privacy issues Personally identifiable information (PII) Social Security numbers, driver s license numbers, government-issued identification numbers Highly confidential information (HCI) Pricing information, engineering, planning, strategy documents Are there Social Security numbers on my file shares? Is customer information being stored inappropriately? Payment Card Industry (PCI) data Credit cards of any type Is confidential company data at risk? A large energy company identified 21 types of PII, HCI and PCI data in its native location and remediated 17 percent of data. 2014 IBM Corporation 34
The Customer Requirement for Regulated Data identification and remediation The IBM solution analyses data at rest proactively reviewing the sensitive nature of content where it resides and providing early awareness of inappropriately secured content. Providing awareness of potential exposure prior to leakage enables organizations to remediate risks. PROBLEM: Content containing details such as PCI, PII, HCI and Privacy data can reside anywhere throughout a network. Traditional DLP tools protect information on the move but don t provide insight to suspect documents at rest. More data is in unstructured format than ever before. Global privacy laws are becoming high profile Content is far more transient as such harder to track Organizations are unaware of sensitive content residing outside of expected security protocol PCI breaches for example could lead to loss of facility to take payment by payment card 93,000 Customer credit card details including CVV numbers stolen Staysure Insurance 35 2014 IBM Corporation 35
Process - Regulated Data identification and remediation Define priority on regulated content requiring identification Build patterns and query string to highlight suspect occurrences Validate accuracy such as number of false positive until confidence threshold is met Identify and index high risk content sources Email, Desktops, Home Drives, Collaboration systems Schedule incremental updates to maintain diligence Run heat maps and reports of potential violations Quarantine and/or delete offending items 2014 IBM Corporation 36
The Business Case for Regulated Data identification and remediation Identify and remediate regulated data to reduce RISK Risk of litigation eg. Mishandling personnel data Risk associated with lack of compliance regulator does not accept lack of robust information governance processes Risk of Data Leakage - Commercial risk resulting from loss of documents that contain intellectual property and confidential information Reputation risk - associated with loss of information that could be embarrassing to the organization 2014 IBM Corporation 37
High Business Value Identify and remediate data with high business value 2014 IBM Corporation
Classify data to identify business value Use a combination of rules and machine learning to identify and classify data of business value, making it readily available for stakeholder needs and data analytics Filter1 Filter2 Filter3 Action Volume Relevance 2014 IBM Corporation 39
The Customer Requirement for High Business Value Data identification and remediation The IBM solution provides extensible reach across Dark Data silos in order to classify content. Utilizing industry leading search, pattern matching and machine learning technology to find and remediate high value data PROBLEM: Users have persistently failed to consistently and diligently classify and store business records. Records remain unregistered in email, desktops and file shares alike exposing companies with failure to comply to regulation and legislative responsibilities. User classification not consistent, records being left in the wild Intellectual property is being lost costing organizations dearly in lost revenues and lost time to market $3.75m Fine by FINRA for systemic email and record retention failure Barclays 40 2014 IBM Corporation 40
What are typical high value items do we look for Intellectual property Contracts Financial backup Customer data Design documents Pricing HR data Hiring data Tax work product Investor data Patent information Sales and Marketing data All are critical to organizational smooth running and subject to corporate governance and control 2014 IBM Corporation 41
Process - High Business Value Data identification and remediation Collate high value retention plan and data types Target key data types that expose highest value or risk to business Prioritise key record types to scope in Do not attempt 100% coverage across 100 s of file series, 30 to 200 is realistic Define acceptable thresholds of accuracy Coach record managers who will only engage for 100% coverage of file plan at 100% accuracy Current employees are not classifying as such 100% coverage to high a bar Studies prove that user classification can be as low as 60% accurate and extremely inconsistent especially between individuals. Set a realistic bar Train classification models Use the appropriate level of complexity, sometimes key words work, only after expand to pattern matching and Machine Learning techniques Copy/move identified data to repository of record IBM 42 3/11/2014 2014 IBM Corporation 42
Identify High Value data in-place, across multiple repositories AND take action Identify data in 75+ repositories Classify Data based on Boolean search, pattern matching and machine learning Create rule sets for data action Act of data to secure, report, copy, move or delete 2014 IBM Corporation 43
Identify, analyze and act on the data that matters Mergers and acquisitions Identify data across more than 75 data source types to be consolidated, protected or remediated How do I consolidate useful data? Divestitures Identify classified and copied corporate intellectual property prior to divestiture of business units Where is corporate intellectual property? Storage migration Identify segment data based on type, age and last accessed date prior to storage migration Which data do employees actually use? A large global bank consolidated acquired bank data from thousands of desktops to comply with a regulatory mandate. 2014 IBM Corporation 44
The Business Case for High Business Value Data identification and remediation Leakage of business data can: Create a competitive disadvantage Open the company to legal action Cause embarrassment or loss of reputation in the press Cause loss of productivity for employees impacted by a leak IBM Can help reduce the risk of high value data by: Classifying and identifying high value data across more than 75 repositories Reporting on and securing high value data to ensure it is stored appropriately Creating a continuous process for identifying and managing high value data Ensuring that usage guidelines are adhered to 2014 IBM Corporation 45
Thank you. 462014 IBM Corporation 46