This category contains 5 posts

Database Partitioning: Goals and Strategies

I’ve recently been involved in a number of projects requiring some complex database partitioning strategies. The partitioning strategies chosen were not immediately obvious – in fact they were designed in an iterative manner, taking into account a number of different requirements which, at first, seemed somewhat contradictory, but with refinement of the requirements resulted in … Continue reading

Backups and Restores

Large databases can pose many problems, one of which is how to back up and (more importantly) restore them within the Recovery Time Objective required losing the least amount of data that is practical (the Recovery Point Objective). I’ve recently had the opportunity to experiment with a whole variety of high-end storage, including solid-state cards, … Continue reading

ETL Parallelization: Loading Fact Tables Using the Modulo Shredder

For all the performance optimizations you may put in place, eventually you may run up against a bottleneck – you may want your load time to be reduced, but you find that each subsequent optimization simply will not deliver the performance increase you require. If you’re in this situation, you may want to consider parallelizing … Continue reading

ETL Optimisation: Part 3 – Load

In ETL Optimisation: Part 2 – Transformation, I looked at the Transformation process and simple ways of optimising it. In this part, I’m going to look at optimising the Load process: Consider the Location of the Source Relative to the Destination To avoid repetition, the same arguments hold true for the destination as much as they do for the … Continue reading

ETL Optimisation: Part 2 – Transformation

In ETL Optimisation: Part 1 – Extraction, I looked at data sources and the importance of optimising the design of the data source as well as the process of obtaining the data source. In this part, I’m looking at ways of optimising the transformation of data sources. So here are a few pointers when really … Continue reading

ETL Optimisation: Part 1 – Extraction

I often get brought in to review and enhance existing systems exhibiting performance, scalability or reslience problems, some of which focus on poorly performing or unstable ETL (Extraction, Transformation and Load) subsystems. In this series of three articles, I’m going to focus on optimisation of these ETL processes, whilst mindful of the trade-offs sometimes necessary … Continue reading