This category contains 3 posts

Optimal Compressed Data File Strategies for HDInsight and Azure Data Lake

HDInsight (Microsoft’s canned Azure Hadoop offering) and Azure Data Lake are competing Azure offerings, with many similar features and yet significant differences.

One of the significant differences between the two platforms is their ability to process compressed file formats. This article looks at the similarities and differences between the two and attempts to formulate strategies to gain the maximum performance for each platform. Continue reading


Comparing HDInsight with Azure Big Data Services

Microsoft offers both the Hadoop ecosystem on Azure (which it collectively calls HDInsight) as well as a range of Azure Big Data services. This article attempts to compare and contrast these technologies and to suggest reasons why you might choose one over another. Equivalent Technologies Below is a simplified matrix showing the Azure Big Data … Continue reading

Database Partitioning: Goals and Strategies

I’ve recently been involved in a number of projects requiring some complex database partitioning strategies. The partitioning strategies chosen were not immediately obvious – in fact they were designed in an iterative manner, taking into account a number of different requirements which, at first, seemed somewhat contradictory, but with refinement of the requirements resulted in … Continue reading