Extract, Transform, Load Process
The Extract, Transform, Load (ETL) process is diagramed in the two images on this page. The screenshots display the flow, or data pipeline of a typical project. Large businesses opt for an ELT software to assist. Some ELT tools include:



Databricks
Alteryx
Matillion
Pipelines can vary in size, complexity, style, and structure. However, they all have the same goal of landing cleansed data somewhere - usually a data warehouse. A data warehouse is used for storing, organizing, and analyzing large volumes of data from multiple sources. It’s specifically designed to support business intelligence (BI) activities like reporting, data analysis, and decision-making. Some of the more well-known data warehouses are:



Snowflake
Goggle Big Query
Amazon Redshift
Alteryx
The topmost image is taking data from offline Excel files, cleansing them, then writing data into a data warehouse. The blue sections indicate the extract piece of the process, which in this case is a connection to an Excel file. The pink indicates the transform step, where data is being altered in some way. The green indicates the load step, where data is being stored somewhere. There are a bevy of different transform steps, some of the most common include filters, joins, summarizations, and add columns.

Filtering the data will split the data. As an example, if we only wanted data for year 2025 then we would code “YEAR” = ‘2025’. There are always 2 outputs, one dataset that is 2025 and the other dataset that is everything but 2025.

A join will split the data 3 ways. The ones that match the other dataset come through the middle rightmost output in the icon. The other two nodes represent the unique values that are in only 1 dataset but not the other.

Summarizing (AKA a group-by) is meant to aggregate data from smaller granularities; ie the task is to analyze price by brand, but the data is by item. You can aggregate and take the average price by brand using summation.

Adding columns is just like what it sounds like - you are making an entirely new column in the dataset. Here you can code and write any acceptable formula; ie making a “Price” column by taking “Sales”/”Units”.
There are over 100+ transformation methods.

Alteryx
The bottommost image is taking data that is already in the data warehouse or database (in-DB), cleansing it, then writing data back into that same data warehouse. The benefit of this is that the computing power to perform this ETL job is on the data warehouse itself. The data warehouse is normally on a server or some type of cloud-based storage and will run far faster than running the same job on your own personal computer.