1. Show Me the Test Data
2. Run for Cover
3. Quit Bugging Me
4. Stop Reinventing the QA Wheel
5. Find the Needle in the Test Data Haystack
6. Begin Breaking Bad...Data
7. Assure All-You-Can-Eat Data Ingestion
8. ID Sensitive Data Automatically
9. Mask Data on the Fly
10. Maintain Referential Integrity
11. Keep Private Data Private yet Accessible
12. Test New Releases BEFORE they are Released
13. Move Test Data TO and FROM any Environment
14. Deliver Test Data Instantly
15. Prevent QA Data Collisions
16. Automate, Automate, Automate
Enterprise data is often siloed and fragmented across numerous data sources, scattered throughout the organization. Using production data is often the path of least resistance, however, data protection and privacy legislation often limit the use of production data in development and testing environments. Testers are challenged to obtain data with the appropriate characteristics for each test. According to analyst firm Gartner, QA engineers spend 46% of their time searching, analyzing, and preparing test data.
A word of advice: Use test data automation to enable your testers to concentrate on doing the job they were hired to do – test software, not prepare data.
Test coverage is the method used to measure the percentage of test cases covering the entire application code. Test coverage will be low if the test data represents only a small percentage of production data, both in terms of volume and type. Even production data typically offers a minority of the functional coverage required to fully test an application.
With better software testing coverage, you can reap more test data management benefits:
Software defect density refers to the number of defects or bugs found in a piece of software code, typically measured per unit of code or per function point.
When test data is poorly designed or insufficient, it may not test all possible scenarios or paths through the code. This can lead to defects going unnoticed and remaining in the code, resulting in a higher defect density.
Continuous testing helps identify defects early in the software development lifecycle by allowing developers to test their code more frequently and with more accuracy, ultimately reducing the defect density.
The reusability (versioning) of test data leads to enhanced:
Test data subsetting is a process of dividing a given set of test data into smaller subsets. In this process, a subset of data is selected from the original test data that represents a specific scenario or a part of the system being tested. This is done to reduce the size of the test data set or to create specialized test cases that focus on specific aspects of the software being tested. Subsetting delivers:
A word of advice: Subset test data based on business rules, without scripting. Ideally, test data subsets should hide the technical complexity of underlying source systems, preventing the need to know which databases/tables/columns contain the required data, and ensuring referential consistency.
In testing software applications, bad data can result in inaccurate results, false positives, and false negatives, which can have serious consequences for the quality and reliability of the software. Bad data may result in:
A word of advice: Almost half of all test data automation failures can be attributed to bad data. High-quality test data is required to eliminate bad data conditions.
When we speak with large enterprises, we ask, “Which database technologies do you use?” The usual response is, “At least one of each!” Most large organizations today use legacy systems while striving to regularly upgrade their tech stack and integrate the latest technologies.
Some test data management tools support a limited list of database vendors and versions, albeit skewed to some of the most popular ones. According to analyst firm, Gartner: “Modern applications rely on an increasing number of interconnected data stores, applications, and APIs to function, requiring tools to coordinate and synchronize changes while ensuring relational consistency and addressing security and speed mandates.”
Consequently, enterprise applications rely on more and more data sources.
A word of advice: For the highest quality test data, it’s important to be able to integrate with any kind of data source or technology, including those that may be adopted in the future.
Personally Identifiable Information (PII) discovery in production databases is the process of identifying and protecting sensitive information that can be used to identify an individual. This can include information such as name, social security number, driver's license number, passport number, email address, and other similar identifying information.
PII discovery in production databases is important because it helps to protect the privacy of individuals and prevent identity theft. Production databases may contain large amounts of sensitive data, and it can be difficult to identify all of the data that needs to be protected. PII discovery tools can be used to scan databases and identify any data that contains sensitive information.
Once the sensitive data is identified, appropriate steps can be taken to protect it, such as encrypting the data, restricting access to it, or deleting it entirely if it is no longer needed. PII discovery can also help organizations comply with regulatory requirements, such as the General Data Protection Regulation (GDPR) in the European Union or the California Consumer Privacy Act (CCPA) in the US.
A word of advice: Automatically discover sensitive data in any data store, and take the appropriate steps to protect it according to internal policies.
Masking data (like PII) is often a laborious, repetitive, and time-consuming task for DevOps personnel. Data masking techniques come in many forms, here are a few examples:
A word of advice: Make your test data available to a wide variety of data masking techniques, as well as the possibility to combine data masking tools with data tokenization tools, and/or synthetic data generation tools.
Referential integrity refers to masking each type of sensitive data with the same algorithm to ensure consistency across databases. Data masking tools and processes must be synchronized across the organization for each data type, to keep it functional for analytics and other use cases. If one masks data in each data source separately, it becomes difficult, if not impossible, to maintain consistency for the same business entity across systems. Creating the linkage between entities is a time consuming and complex procedure.
According to analyst Gartner, “With modern applications relying on an increasing number of interconnected data stores (many of which are technologically vastly more complex), applications and APIs to function, testing has become more complex. Such complexity demands that tools support the ability to coordinate and synchronize changes across different data stores to ensure relational consistency while addressing security and speed mandates.”
Dynamic data masking is a technique that protects sensitive data by masking it inflight. By hiding sensitive data from unauthorized users, but not from authorized ones, it helps enforce privacy regulations by:
A word of advice: Avoid static data masking, which provides a window of opportunity for hackers to breach the staging environment and retrieve production data before it’s masked. The process can also be cumbersome and complex, sometimes requiring coding for various integrations, transformations, and maintaining referential integrity.
As software applications evolve (or new applications are created), new versions require testing data that isn't available in production databases. To support new functionality or requirements, applications may require updated schemas. Instead of manually generating synthetic test data, a test data management tool can create realistic, synthetic test data to fill coverage gaps or new/updated schemas. Synthetic data generation allows for greater:
Having spent much time and effort preparing the "perfect" test data set for a given sprint (or test cycle, such as Integration Test, UAT, or Performance Test), DevOps test data management teams are required to prepare a new testing environment for the next sprint (or test cycle).
A word of advice: Allow test data to be provisioned directly from any source (including non-production environments) to any target – without the need for complicated, expensive, and long setups.
Initial setup and refresh of the staging area for test environments is a time-consuming process: from ingesting or refreshing the data from the source systems to the staging environment, running masking jobs, and then provisioning the data to lower environments.
At times it becomes necessary for the QA team to test software functionality with up-to-date data from production sources. In many enterprises, test data is refreshed once a quarter, at best. In some testing scenarios, for example, when recreating bugs reported in production, having a near real-time refresh of source data is crucial.
A word of advice: Provision test data from production in seconds/minutes, rather than days/weeks.
It's not uncommon for testers to inadvertently override each other’s test data, resulting in corrupted test data, lost time, and wasted efforts. Test data must be provisioned again, and tests need to be rerun. The right test data management tool can resolve this issue by segregating test data between individuals to mitigate the risk of corruption.
A word of advice: Allow for test data versioning per individual tester.
Automated test data provisioning delivers better:
A word of advice: Make sure you can provision test data via APIs, as part of a CI/CD pipeline.
Practice what you preach with K2view Test Data Management software.