With the ever-increasing use of relational databases in routine tasks such as storage of retail store transactional data, financial transactional data, sales data and employee data, it has become important to protect such data.
Problem
How could user and customer data be stored while keeping all its characteristics, and at the same time remain hidden from those who don’t need to access the data? An example where user data may need to be hidden is in targeted advertising. Targeted advertising makes use of a user’s personal information to create customized offers for the user. Not everyone may like being given direct advertisement offers.
Solution
Data masking is a technique used to hide data by obfuscating it when needed. Masking removes the identifying characteristics of data such that the user, or customer to whom the data pertains, cannot be identified from the data. Data masking anonymizes the data.
Note: Data masking is an irreversible process. Data that has been masked is irrecoverable; it cannot be converted back to its original form, which is why a copy of the data must be kept before masking it permanently.
Most relational databases, including MySQL Enterprise Edition and Oracle databases, provide SQL functions or tools to mask data. Toad Edge could also be used to run the MySQL examples in this article.
In the following subsections we shall discuss some examples of data masking with MySQL Enterprise Edition.
Displaying contact information
When displaying a user’s name, it is not always needed to display the user’s contact information such as phone number, email and such. MySQL database provides two general purpose functions for masking data. These are mask_inner(str, margin1, margin2 [, mask_char]) and mask_outer(str, margin1, margin2 [, mask_char]). The mask_inner() function masks the inner section of a string while keeping the margin1 number of characters on the left unmasked and margin2 number of characters on the right unmasked. By default, the x character is used to mask the string, but an alternative masking character may be specified with mask_char.
As an example, to mask a phone number while keeping 1 character on the left and 2 characters on the right unmasked, use the mask_inner()function as follows:
mysql> SELECT mask_inner('555-555-5555', 1, 2);
+———————————-+
| mask_inner('555-555-5555', 1, 2) |
+———————————-+
| 5XXXXXXXXX55 |
+———————————-+
The margin1 and margin2 must be non-negative integers; their value could be 0. As an example, mask all but 3 characters on the left in a phone number:
mysql> SELECT mask_inner('555-555-5555', 3, 0);
+———————————-+
| mask_inner('555-555-5555', 3, 0) |
+———————————-+
| 555XXXXXXXXX |
+———————————-+
Use an alternative masking character such as the '*' or the '#' character by specifying the making character as the third function argument:
mysql> SELECT mask_inner('555-555-5555', 1, 2, '*'), mask_inner('555-555-5555',3, 0, '#');
+—————————————+————————————–+
| mask_inner('555-555-5555', 1, 2, '*') | mask_inner('555-555-5555',3, 0, '#') |
+—————————————+————————————–+
| 5*********55 | 555######### |
+—————————————+————————————–+
The mask_outer function is the opposite; it masks the outer characters while keeping the inner ones unmasked. As an example, mask the first three and the last four characters in a phone number:
mysql> SELECT mask_outer('555-555-5555', 3, 4);
+———————————-+
| mask_outer('555-555-5555', 3, 4) |
+———————————-+
| XXX-555-XXXX |
+———————————-+
Again, an alternative masking character such as the '*' or the '#' character may be used by specifying the making character as the third function argument:
mysql> SELECT mask_outer('555-555-5555', 3, 4, '*'), mask_outer('555-555-5555',7, 0, '#');
+—————————————+————————————–+
| mask_outer('555-555-5555', 3, 4, '*') | mask_outer('555-555-5555',7, 0, '#') |
+—————————————+————————————–+
| ***-555-**** | #######-5555 |
+—————————————+————————————–+
Data could be masked while selecting and displaying data from a table. As an example, consider a table for employee data:
CREATE TABLE employee
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
f_name VARCHAR(40),
l_name VARCHAR(40),
email VARCHAR(11),
phone VARCHAR(10)
);
When displaying data from the table, the phone and email, data may be masked as follows:
SELECT id, f_name, l_name, mask_inner(email, 3,5) AS masked_email,
mask_outer(phone, 3,4) AS masked_phone FROM employee;
A view may be defined to mask table data. As an example, create a view on the same table as:
CREATE VIEW masked_employee AS
SELECT id, f_name, l_name,
mask_inner(email, 3,5) AS masked_email,
mask_outer(phone, 3,4) AS masked_phone FROM employee;
Subsequently select data from the view:
SELECT id, f_name, l_name,masked_email, masked_phone FROM masked_employee;
Displaying personal data
The general purpose functions may be used for most usage scenarios. Some specialized functions are provided for personal data such as social security number and primary account number. The mask_pan(str) function masks all but the last four numbers in a personal account number, as an example:
mysql> SELECT mask_pan('12345678910');
+————————-+
| mask_pan('12345678910') |
+————————-+
| XXXXXXX8910 |
+————————-+
The mask_pan_relaxed(str) function is somewhat relaxed in that it masks all but the first six and the last four characters in a personal account number.
mysql> SELECT mask_pan_relaxed('123456789101112');
+————————————-+
| mask_pan_relaxed('123456789101112') |
+————————————-+
| 123456XXXXX1112 |
+————————————-+
The specialized function mask_ssn(str) replaces all but the last four characters in a U.S. social security number with an ‘X’. An example:
mysql> SELECT mask_ssn('555-55-5555');
+————————-+
| mask_ssn('555-55-5555') |
+————————-+
| XXX-XX-5555 |
+————————-+
Data modeling and testing
It is not uncommon to mimic or fake actual data during data modeling and testing. MySQL database provides some functions to generate random, fake data for data modeling. The gen_range(lower, upper) function returns a random integer within a range set by the lower and upper function arguments. As an example, generate an integer in the range 1 to 10, and another integer in the range -10 to 10 as follows:
mysql> SELECT gen_range(1, 10), gen_range(-10, 10);
+——————+——————–+
| gen_range(1, 10) | gen_range(-10, 10) |
+——————+——————–+
| 8 | -6 |
+——————+——————–+
The gen_rnd_email() function generates a random email in the example.com domain. An example:
mysql> SELECT gen_rnd_email();
+—————————+
| gen_rnd_email() |
+—————————+
| eptgi.dvxgjae@example.com |
+—————————+
The function generates a different email each time it is called:
mysql> SELECT gen_rnd_email();
+—————————+
| gen_rnd_email() |
+—————————+
| holxd.uyohqpw@example.com |
+—————————+
The gen_rnd_ssn() function returns a random U.S. social security number in the format xxx-xx-xxxx. As example:
mysql> SELECT gen_rnd_ssn();
+—————+
| gen_rnd_ssn() |
+—————+
| 908-39-3695 |
+—————+
Each time the function is called, it returns a different social security number:
mysql> SELECT gen_rnd_ssn();
+—————+
| gen_rnd_ssn() |
+—————+
| 978-61-1948 |
+—————+
The gen_rnd_us_phone() function returns a random U.S. phone number in the format 1-555-xxx-xxxx. As an example:
mysql> SELECT gen_rnd_us_phone();
+——————–+
| gen_rnd_us_phone() |
+——————–+
| 1-555-131-4700 |
+——————–+
The 555 area code is used because it is not used in any actual phone number. The function returns a different phone number when called again:
mysql> SELECT gen_rnd_us_phone();
+——————–+
| gen_rnd_us_phone() |
+——————–+
| 1-555-717-2638 |
+——————–+
Table design upgrade
Data in a production environment grows in size and evolves in characteristics over time. It is not uncommon that a database table needs to be redesigned by adding/removing a column, changing a column’s data type, and such. When a production environment table is offline for design upgrade, the table data doesn’t need to be preserved.
For table design, the table data may be masked or obfuscated using the same masking functions as discussed before. Even though we used numbers in the prior examples, general purpose functions may be used for any string data. An example of obfuscating a string is as follows:
mysql> SELECT mask_inner('This is an example string', 1, 2);
+———————————————–+
| mask_inner('This is an example string', 1, 2) |
+———————————————–+
| TXXXXXXXXXXXXXXXXXXXXXXng |
+———————————————–+
In fact, the complete string may be masked by setting the margin arguments to 0.
mysql> SELECT mask_inner('This is an example string', 0, 0);
+———————————————–+
| mask_inner('This is an example string', 0, 0) |
+———————————————–+
| XXXXXXXXXXXXXXXXXXXXXXXXX |
+———————————————–+
Similarly, the mask_outer() function could be used for masking the outer sections of a string. An example:
mysql> SELECT mask_outer('This is an example string', 10, 10);
+————————————————-+
| mask_outer('This is an example string', 10, 10) |
+————————————————-+
| XXXXXXXXXX examXXXXXXXXXX |
+————————————————-+
Data shredding
Data masking could be used for data shredding sections of data. The SQL DELETE statement could also be used to delete data, but it deletes a complete data record.
Providing hints
Data masking could be used to provide hints in a quiz, crossword puzzles, and such. As an example, a hint for a correct answer Abracadabra could be provided by hiding some of the letters as:
mysql> SELECT mask_inner('Abracadabra', 2, 1);
+———————————————–+
| mask_inner('Abracadabra', 2, 1) |
+———————————————–+
| AbXXXXXXXXa |
+———————————————–+
What not to use data masking for
While we discussed how data masking could be used, it should be mentioned that data masking is not always suitable.
Displaying or storing passwords
Data masking should not be used for storing passwords. Hashing functions are provided for storing passwords.
Data encryption or data encoding
Data masking should not be used for encrypting or encoding data. Encryption and encoding functions are provided for data encryption and encoding.
Fixing data
While some data masking functions could be used for fixing data, data masking is not designed for the purpose of fixing data. An example where you may consider using the mask_inner() masking function to replace an erroneous ‘/’ in a phone number with a ‘-’ is as follows:
mysql> SELECT mask_inner('555/555-5555', 3, 8,'-');;
+———————————-+
| mask_inner('555/555-5555', 3, 8,'-');|
+———————————-+
| 555-555-5555 |
+———————————-+
Keeping the only copy of data
Data masking should not be used for storing the only copy of a data. Consider that you developed a new software program, and you want to store the program such that others may not find and use the program before you have tested and debugged the program. You may obfuscate the code with data masking before saving its only copy. But you would have lost the software program as data masking is irreversible. Data masking is not the same as data encryption and decryption.
In this article, we discussed data masking in SQL, when it could be used and when it should not be used.
Try Toad free for 30 days.Already in a trial? Talk to sales or buy now online. Already a loyal fan of Toad for Oracle? Renew now. |
Related Links
Blog: Code analysis: Why PL/SQL code quality matters
Blog: SQL query optimization: Level up your SQL performance tuning
Blog: Supported Oracle 21c new features in Toad for Oracle 15.1
Blog: Best developer tool: An Oracle Ace's favorite things in Toad for Oracle Base
Blog: Dark mode has arrived for Toad for Oracle 15.1
Blog: SQL tools – 8 ways you can’t live (or work) without them
Have questions, comments?
Head over to the Toad for Oracle forum on Toad World®! Chat with Toad developers, and lots of experienced users.
Help your colleagues
If you think your colleagues would benefit from this blog, share it now on social media with the buttons located at the top of this blog post.
Start the discussion at forums.toadworld.com