Anonymization of data is the process of preserving privacy by deleting or encoding identifiers that link people to their sensitive information in datasets.
Table of Contents
What is Anonymization of Data?
4 Top Reasons for Anonymization of Data
What Data Needs Anonymization?
Drill Down: What PII Needs Anonymization?
When is Anonymization of Data Not Applicable?
Anonymization of Data Based on Business Entities
What is Anonymization of Data?
Anonymization of data is a way companies can protect sensitive, personal, or confidential information that is stored in on-prem or cloud databases – while still retaining the functionality of the data collected. By way of example, running Personally Identifiable Information (PII) like names, addresses, and Social Security Numbers through a data anonymization tool enables organizations to store and access data required for customer service or marketing purposes, while still keeping the source of the data confidential.
Anonymization of data is important – and frequently mandated by regulations like GDPR, CCPA, CPRA, and HIPAA – because it effectively removes identifiers from a given database. These identifiers, taken together or separately, could enable attackers to extract private information from data – which would put both security and privacy compliance at risk.
By using anonymization of data techniques, companies and organizations can adhere to strict data privacy regulations and avoid the public relations fallout and regulatory fines that frequently follow breaches. Yet simply cleaning identifiers from data is not always sufficient. Hackers can apply de-anonymization techniques to reconstruct data. This is generally facilitated by cross-referencing data sources, then recreating sensitive data based on associations between various data fields. Because of this danger, it’s crucial to carefully choose an data anonymization solution.
4 Top Reasons for Anonymization of Data
There are numerous reasons that companies and organizations need anonymized data, notably to:
-
Avoid regulatory liability
Regulations are growing stricter every year, and exposure to regulatory fines and penalties following breaches of privacy is not something companies can take lightly. Anonymization of data is an important step toward meeting tough compliance requirements. -
Enhance digital transformation and governance
Well anonymized data powers digital transformation, by increasing the pace of application development, and makes data governance tools more effective, by shielding sensitive information both inside and outside the organization. -
Lower risk
Anonymization of data helps protect against the potential loss of market share and public trust associated with breaches. Data protection mandates and pubic expectations of privacy and confidentiality demand ever tighter safeguards. When breaches occur, loss of faith in an organization can measurably damage brand equity and revenue alike. -
Reduce insider threats
From malicious data leaks to simple phishing scams, the insider threat to data privacy is ever-present. Anonymization of data protects against data misuse or exploitation by employees, partners, or other trusted third parties.
What Data Needs Anonymization?
Of the many types of data collected, stored, and used by companies, the following are good candidates for anonymization:
-
Personally Identifiable Information (PII): Data that is linkable to a specific individual – like full names, fingerprints, facial photographs, addresses, Social Security Numbers, dates of birth, driver’s license numbers, and more – must be obscured.
-
Intellectual Property (IP): Trademarks, patents, and copyrights represent heavy technological and/or creative investments. Thus, it is frequently a matter of business continuity that this data is not exposed.
-
Protected Health Information (PHI): Medical histories, lab results, personal characteristics, diagnoses, insurance information, and more are highly confidential, even when used for medical/ healthcare research. But PHI needs to be anonymized before it can be used and, especially, released.
-
Payment Card Information (PCI): The Payment Card Industry Data Security Standards (PCI DSS) mandates that organizations guard cardholder data – name, PIN, Primary Account Number (PAN), and more – from unauthorized access.
Drill Down: What PII Needs Anonymization?
Lacking clear regulatory stipulations, the concept of “sensitive” can differ from industry to industry, and even between individuals in an organization. Most organizational polices and regulations concur that PII is sensitive and a good candidate for anonymization. But what comprises PII? Broadly, there is a consensus that PII includes:
-
Passwords
With access to an individual’s password, cybercriminals can easily impersonate that individual. Thus, passwords should be at the top of the data anonymization checklist. -
Mobile telephone numbers
They may sound innocuous, but personal phone numbers are the gateway to accessing someone’s mobile device, which can contain highly sensitive data. For this reason, personal phone numbers need to be anonymized. -
Photographs
Pictures are the ideal means of identification, and are frequently used to verify identity and ensure security. This means that a dataset with photos of individuals is a strong candidate for data anonymization. -
Security questions
Many web sites and applications use security questions as a 2nd or 3rd layers of authentication, making them key identifiers that should most definitely be anonymized. -
Names
An individual’s name is always the most significant identifier in any dataset and must be safeguarded via anonymization.
-
Credit card details
A credit card number, expiration date, and CVV are unique to each individual, and carry dramatic financial implications if compromised.
When is Anonymization of Data Not Applicable?
Organizations maintaining sensitive datasets should bear in mind that many dataset use cases actually don’t require sensitive data to produce satisfactory results. For example, a dataset that covers shopper buying habits according to age range does not need participant names or even exact ages. Similarly, there is not always a need for actual PII to effectively train AI algorithms. This means that some data doesn’t need to be anonymized because the sensitive portions can simply be eliminated, or because they can be replaced with synthetic datasets.
What’s more, some datasets can safely remain in their original formats. DBAs should understand that anonymization of data is not necessary in all cases, and identify which datasets need to be made anonymous, and which do not.
Anonymization of Data Based on Business Entities
The most effective and technologically advanced methodology for anonymization of data relies on the entity-based data masking technology. A business entity corresponds to all the data associated with a specific device, customer, invoice, etc. Data relating to each business entity is stored and accessed from an individually-protected Micro-Database™. By using entity-based data anonymization that leverages intelligent business rules, organizations are better able to maintain productivity while still ensuring compliance and customer privacy.