Wednesday, 13 September 2017

Databergs…Are You On Course for a Titanic Sinking?

By Will Lambert, Pre/Post Sales Cyber Security Consultant, ZeroDayLab Limited
As you power through the globe’s business oceans, navigating oceans of regulatory and legal compliance are you on course for a business-sinking crash with a databerg?  The captain of your ship probably knows databergs are there but are you adequately prepared to hinder or reduce the damage a databerg could inflict?

This is not going to be another blog talking about EUGDPR but it is on the horizon and would therefore be irrational for me not to mention it. Under EU GDPR, Article 35 states that you must not hold excessive amounts of data.  Be aware, even if you deal mainly with the UK,  this is likely to become UK law long past Britain sets sail from the EU in March 2019. According to a statement by Elizabeth Denham – UK Information Commissioner at the ICO in Cheshire,

“The big question is what happens when the UK leaves the EU. The legal relationship answers are for government to give – I’m a regulator, independent of government - but they’ve made it clear that EU law will remain UK law, until the Government sees fit to repeal it.”

This is not just about EUGDPR anymore, look past March 2019. Steps need to be as soon as possible to chip away at your databerg so it is a more manageable and preferably fine-avoidable, or at least a fine-reducing size.

What is Your Databerg?
When discussing data, there are mainly three types of data: Structured, Unstructured and Semi-Structured. Simply put, Structured data is easily searched by relational databases and can be indexed and investigated using search strings. Overall, structured data is something machines can easily understand. Unstructured is almost everything else. Think of unstructured data as data that is written, or presented for humans to understand easily. Muddying the waters a little, Semi-Structured data is Unstructured documents that allow indexing and investigation by search strings; usually by adding tags. This tagging element (usually metadata) allows specific elements of the data to be addressed and located by using search strings. Common metadata includes, Name, Date Created or Owner.

85% of Your Data is of No Use To Your Organisation
The reality is only a small proportion of data is readily seen or used by organisations. In fact, the Veritas Databerg Survey 2016 states that a huge 85% of stored data is Dark, Redundant, Obsolete or Trivial!  85% of your data is essentially of no use to your organisation.  A huge proportion of your data is potentially being stored and maintained for no business gain and if it’s not removed, it could tear a hole in your hull letting a flood of fines sink your business.

I can almost hear you shouting at me, “but, we have to keep certain data!”. Yes, that’s true. You do have certain regulatory and legal requirements for keeping data but without performing detailed and meticulous data analytics and discovery, how do you know what you need to keep and what can be, or needs to be removed? Therefore, it is important to locate Dark, Redundant, Obsolete and Trivial (ROT) data.

The Value of Dark Data is Unknown to Your Organisation
Dark data is data you do not even know exists and therefore neither a quantative or qualitive value can be attributed to it. It is also data you do know is there but are unsure what it is.  The tradition has been to ‘keep it, just in case’ for data where you are unable to identify the person or responsible role for the data.  However, this reasoning is unlikely to be accepted under EU GDPR.
How Much Duplicated Data Exists?

Redundant data means asking what level of duplication exists.  Do you really need to keep file X five times? Furthermore, what backups have been carried out on the system and what level of duplication exists within those backups? Many businesses are backing up the same area or files more than is necessary.

What is Past its Use-By Date?
Obsolete data involves analysing its ageing characteristics. If a file was created in 2005 and not modified since 2009…. does it still necessitate keeping? Be aware, this range of “obsolete” files would require consultation with Legal before their removal to keep in line with regulatory and legal requirements.

Trivial or Vital?
You should have a good idea of what file extensions should make up your databerg and what is classed as ‘trivial’ and unimportant.  If your business produces mainly documents, you could argue that picture files are required but what about film and audio files? If they are not important, chip these file types off your databerg!

Where do You Start the Journey?
A decent data discovery exercise will not only show you what data you have and where it is but also looks at ageing statistics showing how much data you have presently and how much you had in the past.  From those statistics, you can project your organisation’s future databerg if nothing is done. I’m sure you will agree; this level of analytics will come in extremely handy when writing business cases for the necessary tools needed to scale and hack away at your company’s databerg.

Reduction is not just about the compliance with EU GDPR, it also affects budgets.  Certain expenditures will always be associated to data. Power and storage costs are just the start. Storage is becoming cheaper by the day but why pay for X when you only need half the amount?
Plus, if the 85% figure is to be believed…. well those are fractions I can’t do but we can agree – that’s a massive saving. What if your organisation makes use of Infrastructure as a Service (IaaS) cloud storage solutions? Cloud providers tend to be charge around between $26 - $155 pcm per user for 1TB storage. Reduce this down to 100GB, the charge reduces from $4 - $33 pcm. Again, it’s quite the saving. (All prices compared at

The identification of the content of your databerg is the first step and must be completed before grabbing crampons and ice axes to begin ploughing through the data.  Remember, if you find yourself in the unfortunate position of being breached, having carried out data discovery and taken steps to reduce your databerg, this can only ever be looked upon favourably by any governing body. The moral of the story is: switch on the radar, identify the size of the risk and while it’s a long, arduous and monotonous journey it’s better to reduce the databerg than letting your organisation be the next titanic sinking.