This category contains 4 posts

ETL Parallelization: Loading Fact Tables Using the Modulo Shredder

For all the performance optimizations you may put in place, eventually you may run up against a bottleneck – you may want your load time to be reduced, but you find that each subsequent optimization simply will not deliver the performance increase you require. If you’re in this situation, you may want to consider parallelizing … Continue reading

ETL Optimisation: Part 3 – Load

In ETL Optimisation: Part 2 – Transformation, I looked at the Transformation process and simple ways of optimising it. In this part, I’m going to look at optimising the Load process: Consider the Location of the Source Relative to the Destination To avoid repetition, the same arguments hold true for the destination as much as they do for the … Continue reading

ETL Optimisation: Part 2 – Transformation

In ETL Optimisation: Part 1 – Extraction, I looked at data sources and the importance of optimising the design of the data source as well as the process of obtaining the data source. In this part, I’m looking at ways of optimising the transformation of data sources. So here are a few pointers when really … Continue reading

ETL Optimisation: Part 1 – Extraction

I often get brought in to review and enhance existing systems exhibiting performance, scalability or reslience problems, some of which focus on poorly performing or unstable ETL (Extraction, Transformation and Load) subsystems. In this series of three articles, I’m going to focus on optimisation of these ETL processes, whilst mindful of the trade-offs sometimes necessary … Continue reading