10 ITadviser Autumn 2009
data protection
investigating system problems. This system should also handle the
extract of data from production systems which are routinely sent
to other agencies. Several examples of this occurred in 2008 when
whole data files were sent to other agencies on disc.
Fundamentally there are two possible scenarios here:
l the data was needed for testing, in which case it should have
been obfuscated and encrypted; or
l the other agency had a legitimate right to use the live data, in
which case the data would have only been encrypted.
In either case only the data required should have been extracted
and not the full data set. Further protection could be provided by
obfuscating data. Obfuscating data as a batch activity is relatively
simple to achieve however, there is an additional issue around IT
support staff being able to access sensitive personal data while
diagnosing system problems that would be much harder to solve.
This is made more complex if it is the personal data that is causing
the problem that the support staff are trying to help the end-user
solve. In such cases end-users will always need support staff with
the right clearance to look at live personal data.
There is also a need for a higher level of security in relation to
the usage of production data in non-production environments. The
previous practice of copying sets of production data into
development and test environments to support application delivery
activities should no longer be considered as best practice.
There are two solutions to the problem of creating data to
support test and development activities:
Data obfuscation (which is also sometimes referred to as data
anonymisation, data masking, data privacy, data scrambling) � the
test data is built from a sub-set of the production data that has
been subject to a number of techniques designed to obscure the
origin of the data. Specifically those techniques must prevent
personally identifiable information or sensitive information from
being identified from data. The techniques must not allow the
original data to be re-created by reverse engineering.
Data synthesis � the data is created from first principles based
on the test data requirements generated by the associated test
strategy and test cases.
The focus should therefore be to create a solution to allow the
batch obfuscation of data files for use either in test and
development environments or for transmission to external third
parties. In addition, even obfuscated data should be treated as
potentially sensitive and should therefore be subject to Information
Lifecycle Management (ILM). Thus, there should be clear policies
around creation, retention, backup / restore and secure destruction
of the created data sets.
Whilst data obfuscation is the preferred approach, some
organisations are interpreting the guidance for DPA very strictly
and are not allowing test datasets that are based on anonymised
production data at all. Often this is on the basis of the ICO's view
that true anonymisation1
cannot be achieved if it is possible for the
data controller to identify individuals from the data. For those data
synthesis is the only option.
Data Obfuscation Techniques
As already stated, test and development teams need to work with
databases which are structurally correct functional copies of the
live environments. However, they do not need to be able to view
security sensitive information for test and development purposes.
As long as the data looks real, the actual record content is usually
irrelevant.
Given the legal and business operating environment of today,
most will require some form of data obfuscation. There are a
variety of techniques available and usually several will be required
as the format, size and structure of the data dictates:
l Null'ing Out l Masking Data
l Data Substitution l Shuffling Field Records
l Number Variance l Synthetic Data
l Gibberish Generation l Masking data using lookup values
l User-defined masking routines
Encryption/Decryption is not listed here as it is a different
approach to securing data than data obfuscation. There is
generally a need to apply multiple techniques to obfuscating data
depending upon the expected use.
One key concern is repeatability. When designing the data
obfuscation routines developers need to realise that they will
eventually become a production process � even if the data is only
destined for test environments. In other words, the data will need
to be obfuscated each and every time a test database is refreshed
from production which typically occurs on a periodic basis (e.g.,
daily, weekly, etc.). This means that data obfuscation routines that
are easy to run and simple to maintain will soon recover any extra
development effort or costs. However, there is a potential danger
that a repeatable process could allow the original data values to be
reconstructed from the obfuscated data. To reduce this danger, it is
often advisable to use multiple obfuscation techniques as this
reduces the ability for the original data to be reconstructed.
Evaluating the Effectiveness of Data
Obfuscation
Data Obfuscation techniques can be classified by a number of
criteria:
l Usefulness � measures how appropriate is the obfuscated data set
for use after it has been changed.
l Potency � measures how much time, effort and skill is required
by an attacker to understand, and remove, the obfuscation
construct.
l Resiliency � measures how much time, effort and skill an
attacker would expend writing a program to automatically
unobfuscate a construct, and the resources the unobfuscator
requires to run. Increasing the resiliency will help prevent
automated unobfuscation of data.
l Cost � measures the impact of implementing the previous two
methods in execution of the development/testing program e.g.
methods that cause huge memory usage or require long execution
time to create obfuscated data can be said to have a large cost.
Metrics allow the usefulness of particular obfuscation constructs
to be objectively rated and allow users to decide if and when to
include them. It is important that this is not done in the abstract
but rather in the context of the kind of scenarios in which the data
will be used. For example, as illustrated in the following scenarios.
Scenario 1: Testing an Application that requires data validation
The best approach is to create synthetic data however this is
typically very complex so data substitution is easier to use as it
ensures that data is obfuscated but can still be used to test
validation routines. It should be noted, different tools provide
different capabilities, for example, some tools come with
knowledge of well-known COTS applications like Oracle ebusiness
suite, data validation and relationship rules out-of-the box so it is
easier to use obfuscation and know that the business rules in the
application will still work.
Scenario 2: Securing a data set for statistical analysis
A number of techniques support obfuscation of personal data
while retaining the original numerical data for statistical analysis.
Scenario 3: Backup of Test Data
The best approach is data substitution as it ensures that data is
obfuscated but can still be used to test validation routines.
However, data encryption can also be used to provide an additional
level of protection for stored data.
Recommendations
Data obfuscation should be a key part of all HM Government
development and testing to assist in significantly reducing data
security risks. It is important to realise that at the current time � it
is EDS's view that many clients do not seem to be aware of the
potential risks that may arise as a result of the use of non-
obfuscated data as part of development projects involving internal
or third party staff. Third party suppliers to government clients
also need to understand the potential impact of using live data in
development and testing. Don't just raise problems, demonstrate
the use of obfuscated or synthesised data in development or test
environments.
Reference
1
See ICO Legal Guidance on the DPA Section 2.2.5 at www.ico.gov.uk
Page 1Page 2Page 3Page 4Page 5Page 6Page 7Page 8Page 9Page 10Page 11Page 12Page 13Page 14Page 15Page 16Page 17Page 18Page 19Page 20Page 21Page 22Page 23Page 24Page 25Page 26Page 27Page 28Page 29Page 30Page 31Page 32Page 33Page 34Page 35Page 36Page 37Page 38Page 39Page 40Page 41Page 42Page 43Page 44Page 45Page 46Page 47Page 48Page 49Page 50
Produced by PageSuite