State of GenAI Data Readiness in 2024 - Survey results are in!

Get Survey Report arrow--cta

Table of Contents

    Table of Contents

    Types of Data Masking and Key Feature Requirements – Buyer's Checklist

    Amitai Richman

    Amitai Richman

    Product Marketing Director

    When evaluating a data masking tool, make sure it handles all types of data masking, and includes certain key features. Read on for the complete checklist, and learn how business entities are changing the game.

    Table of Contents


    Not all data masking tools are created equal
    Top Drivers of Data Masking
    Offer Multiple Types of Data Masking
    Mask Unstructured Data
    Maintain Relational Consistency
    Supply Reporting and Auditing Functionality
    Make it Happen with a Business Entity Approach

    Not all data masking tools are created equal

    For enterprises that aim to maintain a fast pace of development while complying with data privacy regulations, data masking is a must – but not all data masking solutions offer the same capabilities.

    Before choosing the right data masking tool for you organization, it’s vital to ensure your selection provides different types of data masking and includes several key features (such as PII discovery, reporting tools, and more).

    Bookmark this article to keep our data masking checklist handy.

    Top Drivers of Data Masking

    Before we dive into the checklist, let’s recap what data masking is and understand why it’s so important today.

    Data masking is a method of data obfuscation and is considered a must for complying with GDPR, CCPA, and other data privacy regulations. It works by replacing PII (personally identifiable information) with scrambled, yet statistically equivalent, data. Although it cannot be reverse-engineered, it is still functional for production use cases, such as test data management.

    The top 4 drivers of data masking among enterprises today include:

    1. Expansion and maturation of privacy laws
      The regulatory landscape is becoming increasingly stringent. Non-compliance with laws such as PCI/DSS, HIPAA, GDPR, CPRA/CCPA, and LGPD can cost companies millions of dollars (4% of their annual turnover or €20 million). Not to mention the costs of litigation and brand/reputational damage. In 2021, EU data protection authorities issued $1.25 billion in fines for noncompliance, up nearly sevenfold from 2020.

    2. Insider threats
      Users of non-production databases (programmers, testers, and DBAs), as well as users with access to insufficiently protected production environments, all pose potential threats. Today, a whopping 60% of data breaches are attributed to insiders. This is a growing concern, as insider security incidents have increased by 47% since 2018, while the cost has risen 31% in the same timeframe. As of 2022, the average annual cost of an insider threat is estimated at $11.5 million.

    3. Expansion of ML/AI projects
      Data privacy and security is considered a primary obstacle to AI implementations among AI and ML engineers. Moreover, the growth of ML/AI projects, as well as the migration of these projects to the cloud, increase the risks of a breach.

    4. Remote work
      Thanks to COVID-19 pandemic, company workforces are more dispersed than ever. As a result, it’s far more difficult to ensure compliance with internal data privacy policies, or safe access to internal networks.

    Make sure your masking tool can:


    1. Offer Multiple Types of Data


    When evaluating data masking solutions, make sure to select one that offers many different types of data masking. Here’s a list of the 7 most important data masking techniques:

    1. Data anonymization
      Data Anonymization involves removing or encrypting the sensitive data found in a dataset. It minimizes the risk of a breach when data is in transit, while maintaining the data’s structure to support analytics.

    2. Data pseudonymization
      Pseudonymization swaps PII, such as a name or driver's license number, with a fake name or random figures. This is a reversible process, and can also be applied to unstructured data, like a passport scan.

    3. Encrypted lookup substitution
      A lookup table provides realistic alternative values to sensitive data, which can be swapped into production datasets. However, it’s critical to encrypt these tables to prevent a breach.

    4. Redaction
      Redaction means replacing sensitive data with generic values in development and testing environments. This technique is useful when the sensitive data itself isn’t necessary for QA or development, and when test data can be different from the original datasets.

    5. Shuffling
      Instead of substituting data with generic values, shuffling is a type of data masking that randomly inserts other masked data. For example, instead of replacing employee names with fake ones, it scrambles all of the real names in a dataset, across multiple records.

    6. Data aging
      If data includes confidential dates, you can apply policies to each data field to conceal the true date. For example, you can set back the dates by 150 or 1,700 days, to maximize concealment.

    7. Nulling out
      This data masking technique protects sensitive data by applying a null value to a data column, so unauthorized users won’t be able to see it.

    2. Mask Unstructured Data


    It’s estimated that up to 90% of all enterprise data is unstructured, or qualitative data. Sensitive unstructured data can be found within images, PDF contracts and agreements, drivers licenses, XML documents, chats, and more. It is often stored on shared files, content management systems, as well as BLOBs or CLOBs within databases.

    Unstructured data can’t be processed or analyzed by conventional data tools. Since unstructured data doesn’t adhere to a predefined data model, it must either be managed in a non-relational (NoSQL) database, or data lake, to preserve it in its raw form.

    Unless your data masking solution is capable of masking unstructured data , data privacy management won’t be possible.

    3. Maintain Relational Consistency


    Anonymized data must be represented consistently throughout your databases. Maintaining relational consistency requires every type of data originating from a certain business system to be masked with the same algorithm.

    While consistent masking protects your systems from cyberattacks, it also ensures data remains functional for analytics and other business use cases. Therefore, when choosing a data masking tool, you’ll want to make sure your selection is built to automatically apply the same types of data masking techniques and algorithms to your business systems.

    4. Supply Reporting and Auditing Functionality


    As we’ve already discussed, complying with data privacy regulations is a top priority for enterprises today. That’s why the data masking solution you choose should come with built-in reporting and auditing functionality.

    Specifically, you should be able to clearly visualize and report on:

    • All masking activities and instances

    • Data dependencies and relationships

    • Applied masking techniques

    With these insights, you can stay on top of all data masking activity, monitor use of masked data, and identify any gaps.

    Make it Happen with a Business Entity Approach

    The easiest way to check every box on this checklist is by going with entity-based data masking technology, where a business entity could be a customer, payment, order, or device. Instead of centralizing sensitive information in a vault or repository, like most other data protection solutions, every instance of business entity data is managed in its own individually encrypted Micro-Database™.

    A business entity approach enables all of the types of data masking outlined in the checklist above, while ensuring relational consistency and security. It protects data at rest, in use, and in transit – giving production, testing, and analytical teams the ability to use data as needed, while minimizing the risk of a breach or non-compliance. It also performs dynamic, static, and on-the-fly structured data masking and unstructured data masking, so you can be sure all sensitive data residing anywhere in your business systems is secured and protected.

    Achieve better business outcomeswith the K2view Data Product Platform

    Solution Overview

    Discover the #1
    data masking tool

    Built for enterprise landscapes

    Solution Overview