Investing in test data management (test data creation, storage, and provisioning) delivers better quality apps, at lower cost, with quicker time to market.
Table of Contents
Why Calculate Test Data Management ROI?The Need for Test Data Management Tools
Top 8 Challenges of Test Data Provisioning
Quantified Benefits of Test Data Management
Test Data Management Benefits Not Quantified in the ROI Model
An Entity-Based Test Data Management Approach
Why Calculate Test Data Management ROI?
Today's challenging economic climate is driving companies to prioritize cost-cutting initiatives, with Return On Investment (ROI) meticulously examined before any investment is made. In software development, test data management is emerging as a top priority due to its ability to improve software quality, reduce testing costs, and accelerate software delivery.
This article provides a framework for justifying an investment in test data management, quantifying the expected ROI in 4 parameters:
-
Reducing test data creation and provisioning costs, by automating 40-70% of the manual labor previously needed.
-
Improving productivity and time to market for software development teams, by cutting application delivery cycle times down by approximately 25%.
-
Reducing test data storage and database costs, by centralizing test data stores, subsetting, and generating synthetic data for more compact, and enriched test datasets.
-
Saving by shifting testing and defect resolution to the left in the software development life cycle, by testing earlier in the cycle and maximizing test coverage.
The ROI and payback calculations for a sample company implementing a test data management solution over 3 years are as follows:
-
ROI: 329%
-
NPV: $8,608K
-
Payback period: 6 months
-
Total benefits: $11,223K
See the test data management ROI report for a full breakdown of all calculations.
The Need for Test Data Management Tools
While many organizations have spent the last few decades improving and accelerating software development processes with agile DevOps methodologies, the need for test data management tools has often been overlooked. Software engineering and quality assurance teams often find provisioning test data an overly complicated and time-consuming process, since they must manually:
-
Extract the necessary data from many different sources
-
Transform (cleanse, format, and mask) the data for compliance
-
Load the data into the relevant development and testing environments
According to analyst Gartner’s Software Engineering Leaders Survey, onboarding, training, and keeping talent happy ranks as one of the top challenges facing data-intensive companies today.
A shift-left testing approach calls for testing earlier in the development cycle. Early testing reduces the risk of costly rework, by locating defects and correcting them sooner, rather than later. According to the US National Institute of Standards and Technology, the cost to fix a defect found during the testing phase is about 15 times greater than one found in the design phase. And the cost to correct a defect found during deployment is about 30-100 times greater than one found during design.
Test data management supports shift-left testing by providing quick access to test data, optimizing test data operations, and maintaining data privacy and security. The importance of test data management combined with shift-left testing is earlier defect detection, enhanced software quality, and accelerated time to market.
Top 8 Challenges of Test Data Provisioning
The most common provisioning challenges faced by test data preparation tools are:
-
Sourcing the test data
Enterprise data is often fragmented and siloed across scores of data sources and stored using different technologies and formats, making it hard for testers to get the data they need for each test. One researcher found that QA engineers spend 46% of their time locating and preparing test data. -
Achieving full test coverage
Test coverage is a metric used to measure how much of an application’s code is exercised by test cases. Ensuring you have the test data you need to fully execute your pre-defined test cases is critical since low test coverage is directly proportional to high defect density. -
Reducing false positives and negatives
When test data is badly designed, it often leads to false positive errors, resulting in wasted time and effort dealing with non-existent defects. When there’s not enough test data, false negatives can happen, which can impact the quality and reliability of the application. -
Reusing (versioning) test data
By versioning datasets, it becomes possible to rapidly rerun tests to validate that the defects that were discovered in a previous test were fixed. Versioning is also essential for regression testing on the same data. -
Subsetting test data
Subsetting allows DevOps test data management teams to identify and extract an accurate test data subset to activate specific test scenarios with 100% coverage. It also enables them to reduce the quantity of test data (and the need for related hardware and software). -
Protecting test data
Data privacy regulations, such as CPRA, GDPR, and LGPD require that Personally Identifiable Information (PII) – sensitive data that can be used to identify someone (e.g., name, Social Security Number, driver’s license, or email address) – undergo data de-identification or data anonymization within the test environment. Discovering and masking all PII, while ensuring the relational integrity of all test data across all systems, is time-consuming and labor-intensive for data teams. -
Enforcing referential integrity
Referential integrity is about the consistency of data across database tables. For instance, when an unknown key value is used in a table, it must reference a valid, existing primary key in the parent table. Assuring referential integrity of test data across databases is critical to the validity of the data and becomes even more difficult to enforce after data masking. -
Preventing QA data collisions
Sometimes testers inadvertently override one another’s test data, causing corrupted test data, as well as wasted time and effort. In such cases, test data must be reprovisioned, and tests must be rerun.
Quantified Benefits of Test Data Management
A summary of quantified test data management benefits (over 3 years) is detailed below:
Cash Flow |
Setup |
Year 1 |
Year 2 |
Year 3 |
Total |
Costs ($K) |
276 |
779 |
779 |
779 |
2,615 |
Total benefits ($K) |
2,104 |
3,340 |
5,778 |
11,223 |
|
Net Present Value (NPV) ($K) |
1,048 |
2,560 |
4,998 |
8,608 |
|
ROI (percent) |
329% |
||||
Payback (months) |
6 |
Test Data Management Benefits Not Quantified in the ROI Model
In addition to the benefits quantified above, an organization using test data management software would also benefit from:
-
Faster time to market for key business applications, leading to earlier revenue intake
-
Improved DevOps Research and Assessment (DORA) metrics
-
Better developer and QA experience and talent retention
-
Lower regulatory compliance costs and avoidance of penalties
-
Reduced carbon footprint and emissions
An Entity-Based Test Data Management Approach
An entity-based test data management approach ingests and organizes data via business entities (customer, employee, device, order, etc.) into a test data store while compressing and masking the data and enforcing referential integrity. Testing teams can then provision compliant subsets to their target environments, on demand.
Entity-based test data management accelerates application delivery by instantly creating test data from production and generating synthetic test data when needed. It can also move test datasets from one test environment to another, between sprints.
Additional benefits of an entity-based test data management strategy include:
-
Increased test coverage
-
Improved tester productivity
-
Reduced test duration, and quicker time-to-market
-
Greater efficiency, by decommissioning redundant testing environments (HW and SW)
-
Enhanced data protection
-
Zero impact on current systems and operations