Archives

Ian Posner

Ian Posner is an independent consultant specialising in the design, implementation and troubleshooting of systems that demand the very highest performance and scalability.
Ian Posner has written 14 posts for ianposner

Innovations in Analytics

There are many analytic engines and appliances out there. Yet many of these engines share common technology patterns: In-memory databases, transparent sharding and columnar storage are all technology patterns that appear in many of these products. What I want to concentrate in this article are particular products that bring something innovative to the table. Here … Continue reading

Optimal Compressed Data File Strategies for HDInsight and Azure Data Lake

HDInsight (Microsoft’s canned Azure Hadoop offering) and Azure Data Lake are competing Azure offerings, with many similar features and yet significant differences.

One of the significant differences between the two platforms is their ability to process compressed file formats. This article looks at the similarities and differences between the two and attempts to formulate strategies to gain the maximum performance for each platform. Continue reading

Comparing HDInsight with Azure Big Data Services

Microsoft offers both the Hadoop ecosystem on Azure (which it collectively calls HDInsight) as well as a range of Azure Big Data services. This article attempts to compare and contrast these technologies and to suggest reasons why you might choose one over another. Equivalent Technologies Below is a simplified matrix showing the Azure Big Data … Continue reading

Database Partitioning: Goals and Strategies

I’ve recently been involved in a number of projects requiring some complex database partitioning strategies. The partitioning strategies chosen were not immediately obvious – in fact they were designed in an iterative manner, taking into account a number of different requirements which, at first, seemed somewhat contradictory, but with refinement of the requirements resulted in … Continue reading

NOSQL Document Stores: The Realm of Dispensable Data

At one time, when we didn’t have much data, most of what we did have was considered either essential, or very valuable indeed: Accounts, legal documents, receipts, orders, medical records – you get the picture. Because we couldn’t generate, store or process much information, that which we did generate, store and process had real importance: Not … Continue reading

The NOSQL Landscape

There are a lot of products today marketing themselves as NOSQL. However the more one digs deeply into the supported features of each, the more startling are the differences. Unlike RDBMSs, the feature set of NOSQL databases is radically different from one to another: In areas such as transaction support and scope, programmatic interfaces and … Continue reading

SMB Direct: A Shake-Up for the Storage World

There’s a new technology on the block that’s going to shake-up the storage world in a big way: SMB Direct. SMB Direct is built upon Remote Direct Memory Access (RDMA)  which enables very low-latency connections between the memory of two computers without using the operating system of either. When this technology is embedded within network interface … Continue reading

Backups and Restores

Large databases can pose many problems, one of which is how to back up and (more importantly) restore them within the Recovery Time Objective required losing the least amount of data that is practical (the Recovery Point Objective). I’ve recently had the opportunity to experiment with a whole variety of high-end storage, including solid-state cards, … Continue reading

JSON v. XML: Is the new kid on the block really better?

The JSON data format is the (relatively) new kid on the block: It’s now becoming one of the most popular formats for data exchange, especially in the *nix world. Why the popularity? Well, just like XML, it’s human-readable. And just like XML, JSON’s hierarchical structure represents hierarchical data in a nice, easy-to-comprehend format. Just like … Continue reading

ETL Parallelization: Loading Fact Tables Using the Modulo Shredder

For all the performance optimizations you may put in place, eventually you may run up against a bottleneck – you may want your load time to be reduced, but you find that each subsequent optimization simply will not deliver the performance increase you require. If you’re in this situation, you may want to consider parallelizing … Continue reading