Digital Landfill, Data Lake, or Missed Opportunity?

Download the Data Lake Datasheet

Many of us do it – keep something that we don’t really need, “just in case’. Whether it is a fancy shirt that has never really fitted us, or a kitchen pan, or a book that we will never likely get to read. This desire to hang on to things is part of human nature. But in a business context this hoarding mentality can be extremely damaging and costly.

This 2008 presentation from former AIIM CEO John Mancini highlights what digital landfill contains – all of the pdfs, scans, draft documents, spreadsheets and generally digital “stuff” that gets stored on your corporate file shares. But other than taking up a lot of space – what is the issue with this? Well there are many:

  1. You can’t control what you don’t understand
    Governance, compliance and regulation are never pleasant things to deal with, but in a world where litigation is high, information security and personal privacy is becoming ever more important. And the simple fact is, if you have no idea what information you are storing then you cannot effectively control and govern it. A digital landfill is a potential hacker’s playground.
  2. Searching gets harder
    We all know how hard it is to find information within our corporate systems. One of the reasons for this is because there is so much information, but also because our digital landfills never get cleared out. We’re essentially searching through years and years of garbage to try and find what we are looking for – is it any wonder search is difficult?
  3. You’re pouring money down the drain
    Salesmen all over the world will have you believe that storage is cheap. Storage is much cheaper than it used to be, but does that mean you should just store everything forever? No! Numerous research reports claim that around 80% of the content we store is redundant, obsolete or trivial (ROT). If this is the case, and we could get rid of this rot, then we could save over ¾ of our storage costs, and reduce our search times, and reduce the amount of backup storage required, and so on… it’s not a difficult argument.

With so many issues caused by this approach, it would seem an obvious priority to fix the problem. Yet many organizations simply don’t know where to start. The prospect of manually wading through this digital wasteland is not pleasant – largely because the wasteland is unmapped, and enterprises lack the correct tools to create a map, and start attacking their journey. But help is at hand – no it’s not a mirage that you see in this wasteland, but a lake – a data lake.

What is a Data Lake?

A data lake is quite simply a consolidated view of your content wasteland – a single place to view, access and begin work on cleaning up the landfill. This is not just a lake however, but a set of tools and techniques to begin exploring the lake, understanding its contents, and then taking actions as a result.
Those actions could include:

  • The removal of ROT.
  • Identification of content that includes personally identifiable information (PII) such as names, addresses, credit card details and more – information that needs to be carefully managed and secured.
  • Identification and exposure of content and data to artificial intelligence engines for model training

The use cases are many – but each and every one needs a single view on enterprise content to be effective. That is the view that is provided by the data lake.

In short, a data lake is a multi-pronged attack on digital landfill, on information chaos, and on the hoarding mentality still at play within many organizations. Whether you want to use a data lake for clean-up purposes, to save money on storage, or to get more business value from your content, the opportunities exist to better control and command information moving forward. So what are you waiting for?? Let’s go build a lake today!

Download the Data Lake Datasheet

Written by Mark Lugert, President and CEO, Simflofy