Transform before you migrate
It seems obvious, but migrating to cloud can be a frustrating process if your data is not suitably transformed before you try the ETL process.
Optimise data processing
As data volumes get bigger; formats more varied and growth, exponential, data transformation techniques are vital. Routine transformations ensure harmony of data format and target destination. But more importantly, data transformation aims to optimise the efficiency of downstream data processing & analytics tasks through techniques such as:
Filter & aggregate, slice & dice
When confronted with massive volumes of structured data, the task has to be a reducing one: to filter out anything that is unnecessary. Aggregation is an enriching but also a reducing technique, often driven by the need to answer a business question which requires combining datasets to produce information, not just data.
Or, for BI, the business may want to look at data from a different perspective e.g. sales figures by product nationally as opposed to branch totals. From our BI experience: filtering, aggregating or summarising should precede reporting, as their aim is to reduce the load on the server and facilitate faster, more accurate reporting.
Prepare for BI, AI or ML
Before any BI, AI or ML project use data transformation techniques to:
Data scientists can spend up to 80% of their time preparing the data to be fed into AI pipelines, instead of learning from the data. They have to do this because AI & ML can only learn from the data that is fed to them. Don’t turn your data scientists into data janitors!