At one time, when we didn’t have much data, most of what we did have was considered either essential, or very valuable indeed: Accounts, legal documents, receipts, orders, medical records – you get the picture. Because we couldn’t generate, store or process much information, that which we did generate, store and process had real importance: Not only was it important that the data was retained, it was important that it was consistent for all viewers of that data at any point in time and it was important that it didn’t get lost.
As computers advanced and offered the automation of data processing, much of early computing focused on this automation of manual data processing.
Then came the internet.
And with it came a vast amount of new data, much of it seemingly trivial. This seemingly trivial data falls into the Realm of Dispensable Data. The Oxford English dictionary defines “dispensable” with several meanings, the most salient to this article being “That can be dispensed with or done without; unessential, omissible; unimportant.”
It struck me that much of the driving force behind behind many Big Data projects built upon NOSQL document stores is the desire to harness dispensable data and drive value from it. So what exactly constitutes “Dispensable Data”?
In my opinion, data is dispensable when it meets all of the following criteria:
Looking at these three criteria, the answer to the three criteria may differ depending upon the context of who’s asking the question and the reason they’re asking.
Enough of the theory, here are some examples:
As you can see from the examples above, the advent of the internet and, in particular, the advent of social media has introduced a new category of dispensable data.
Dispensable data is the sweet-spot for many of the NOSQL document stores, many of which don’t offer transactional consistency. Instead, they offer what the vendors of these products call “eventually consistent” models. Furthermore, because much of this data is free-form text (blogs, tweets, etc), there is very little processing that can be usefully applied to this data. This also suits document stores.
Unfortunately, there will be many a keen developer who in his/her rush to embrace these technologies, will choose a document store for storage of data that is not dispensable; when a non-transactional data-store is used where transactions are required, it is only a matter of time before real money will be lost.