When evaluating a data masking tool, make sure it handles all types of data masking, and includes certain key features. Read on for the complete checklist, and learn how business entities are changing the game.
Table of Contents
Not all data masking tools are created equal
Top Drivers of Data Masking
Offer Multiple Types of Data Masking
Mask Unstructured Data
Maintain Relational Consistency
Supply Reporting and Auditing Functionality
Make it Happen with a Business Entity Approach
For enterprises that aim to maintain a fast pace of development while complying with data privacy regulations, data masking is a must – but not all data masking solutions offer the same capabilities.
Before choosing the right data masking tool for you organization, it’s vital to ensure your selection provides different types of data masking and includes several key features (such as PII discovery, reporting tools, and more).
Bookmark this article to keep our data masking checklist handy.
Before we dive into the checklist, let’s recap what data masking is and understand why it’s so important today.
Data masking is a method of data obfuscation and is considered a must for complying with GDPR, CCPA, and other data privacy regulations. It works by replacing PII (personally identifiable information) with scrambled, yet statistically equivalent, data. Although it cannot be reverse-engineered, it is still functional for production use cases, such as test data management.
The top 4 drivers of data masking among enterprises today include:
Expansion and maturation of privacy laws
The regulatory landscape is becoming increasingly stringent. Non-compliance with laws such as PCI/DSS, HIPAA, GDPR, CPRA/CCPA, and LGPD can cost companies millions of dollars (4% of their annual turnover or €20 million). Not to mention the costs of litigation and brand/reputational damage. In 2021, EU data protection authorities issued $1.25 billion in fines for noncompliance, up nearly sevenfold from 2020.
Insider threats
Users of non-production databases (programmers, testers, and DBAs), as well as users with access to insufficiently protected production environments, all pose potential threats. Today, a whopping 60% of data breaches are attributed to insiders. This is a growing concern, as insider security incidents have increased by 47% since 2018, while the cost has risen 31% in the same timeframe. As of 2022, the average annual cost of an insider threat is estimated at $11.5 million.
Expansion of ML/AI projects
Data privacy and security is considered a primary obstacle to AI implementations among AI and ML engineers. Moreover, the growth of ML/AI projects, as well as the migration of these projects to the cloud, increase the risks of a breach.
Remote work
Thanks to COVID-19 pandemic, company workforces are more dispersed than ever. As a result, it’s far more difficult to ensure compliance with internal data privacy policies, or safe access to internal networks.
When evaluating data masking solutions, make sure to select one that offers many different types of data masking. Here’s a list of the 7 most important data masking techniques:
Data anonymization
Data Anonymization involves removing or encrypting the sensitive data found in a dataset. It minimizes the risk of a breach when data is in transit, while maintaining the data’s structure to support analytics.
Data pseudonymization
Pseudonymization swaps PII, such as a name or driver's license number, with a fake name or random figures. This is a reversible process, and can also be applied to unstructured data, like a passport scan.
Encrypted lookup substitution
A lookup table provides realistic alternative values to sensitive data, which can be swapped into production datasets. However, it’s critical to encrypt these tables to prevent a breach.
Redaction
Redaction means replacing sensitive data with generic values in development and testing environments. This technique is useful when the sensitive data itself isn’t necessary for QA or development, and when test data can be different from the original datasets.
Shuffling
Instead of substituting data with generic values, shuffling is a type of data masking that randomly inserts other masked data. For example, instead of replacing employee names with fake ones, it scrambles all of the real names in a dataset, across multiple records.
Data aging
If data includes confidential dates, you can apply policies to each data field to conceal the true date. For example, you can set back the dates by 150 or 1,700 days, to maximize concealment.
Nulling out
This data masking technique protects sensitive data by applying a null value to a data column, so unauthorized users won’t be able to see it.
It’s estimated that up to 90% of all enterprise data is unstructured, or qualitative data. Sensitive unstructured data can be found within images, PDF contracts and agreements, drivers licenses, XML documents, chats, and more. It is often stored on shared files, content management systems, as well as BLOBs or CLOBs within databases.
Unstructured data can’t be processed or analyzed by conventional data tools. Since unstructured data doesn’t adhere to a predefined data model, it must either be managed in a non-relational (NoSQL) database, or data lake, to preserve it in its raw form.
Unless your data masking solution is capable of masking unstructured data , data privacy management won’t be possible.
Anonymized data must be represented consistently throughout your databases. Maintaining relational consistency requires every type of data originating from a certain business system to be masked with the same algorithm.
While consistent masking protects your systems from cyberattacks, it also ensures data remains functional for analytics and other business use cases. Therefore, when choosing a data masking tool, you’ll want to make sure your selection is built to automatically apply the same types of data masking techniques and algorithms to your business systems.
As we’ve already discussed, complying with data privacy regulations is a top priority for enterprises today. That’s why the data masking solution you choose should come with built-in reporting and auditing functionality.
Specifically, you should be able to clearly visualize and report on:
All masking activities and instances
Data dependencies and relationships
Applied masking techniques
With these insights, you can stay on top of all data masking activity, monitor use of masked data, and identify any gaps.
The easiest way to check every box on this checklist is by going with entity-based data masking technology, where a business entity could be a customer, payment, order, or device. Instead of centralizing sensitive information in a vault or repository, like most other data protection solutions, every instance of business entity data is managed in its own individually encrypted Micro-Database™.
A business entity approach enables all of the types of data masking outlined in the checklist above, while ensuring relational consistency and security. It protects data at rest, in use, and in transit – giving production, testing, and analytical teams the ability to use data as needed, while minimizing the risk of a breach or non-compliance. It also performs dynamic, static, and on-the-fly structured data masking and unstructured data masking, so you can be sure all sensitive data residing anywhere in your business systems is secured and protected.