Big Data

This category contains 3 posts

Optimal Compressed Data File Strategies for HDInsight and Azure Data Lake

HDInsight (Microsoft’s canned Azure Hadoop offering) and Azure Data Lake are competing Azure offerings, with many similar features and yet significant differences.

One of the significant differences between the two platforms is their ability to process compressed file formats. This article looks at the similarities and differences between the two and attempts to formulate strategies to gain the maximum performance for each platform. Continue reading


Comparing HDInsight with Azure Big Data Services

Microsoft offers both the Hadoop ecosystem on Azure (which it collectively calls HDInsight) as well as a range of Azure Big Data services. This article attempts to compare and contrast these technologies and to suggest reasons why you might choose one over another. Equivalent Technologies Below is a simplified matrix showing the Azure Big Data … Continue reading

NOSQL Document Stores: The Realm of Dispensable Data

At one time, when we didn’t have much data, most of what we did have was considered either essential, or very valuable indeed: Accounts, legal documents, receipts, orders, medical records – you get the picture. Because we couldn’t generate, store or process much information, that which we did generate, store and process had real¬†importance: Not … Continue reading