A Very Short Introduction of Zero-Centring and Whitening

AI (Artificial Intelligence), Blog

December 25, 2024
11:59 pm

Zero-Centring and Whitening: Standardising Data for Machine Learning

Imagine comparing apples, oranges, and bananas in a fruit salad recipe. Some are sweet, others tangy, and their sizes and colours vary widely. To create a harmonious dish, you’d peel the fruits, cut them into uniform pieces, and balance their flavours with just the right amount of seasoning. Zero-centring and whitening do exactly that for data: they standardise and balance variables to ensure that machine learning models process them effectively.

A Brief History of Zero-Centring and Whitening

The origins of zero-centring and data whitening are rooted in statistical data pre-processing techniques, which gained prominence in the mid-20th century. Developed by statisticians like Ronald Fisher, these concepts became integral to machine learning algorithms as data processing grew more complex. Today, they are fundamental steps in preparing data for AI and data science, ensuring consistency and accuracy.

What Are Zero-Centring and Whitening?

Zero-centring adjusts data so that its mean is zero, like moving the fruit salad’s sweet-and-sour balance to neutral. Whitening, on the other hand, decorrelates variables and ensures uniform variance—imagine evenly dicing the fruit so every bite feels balanced.

In technical terms:

Zero-Centring in Machine Learning: Subtracts the mean of each feature from the data, aligning it around zero.
Whitening in Data Pre-processing: Reduces redundancy by transforming the data into an uncorrelated, standardised format.

Why Are Zero-Centring and Whitening Used?

These techniques are essential for effective machine learning pre-processing because they:

Improve Model Accuracy: Preventing skewed or biased inputs helps models make better predictions.
Speed Up Training: Well-processed data allows machine learning algorithms to converge faster.
Eliminate Redundancy: Whitening removes feature correlations, making data cleaner and more efficient.

For example, in a dataset predicting house prices, zero-centring ensures that features like “location” or “square footage” don’t overpower other factors, while whitening prevents correlated features from duplicating influence.

How Are Zero-Centring and Whitening Used?

The process involves:

Calculating the Mean of Features: Determine the average value of each feature.
Zero-Centring Data: Subtract the mean to align features around zero.
Calculating the Covariance Matrix: Measure relationships between variables.
Whitening Transformation: Use techniques like PCA to decorrelate variables and equalise variance.

Tools like Python’s NumPy library, Scikit-learn, and TensorFlow simplify these steps.

Different Types of Whitening

Whitening can be applied in several ways:

PCA Whitening: Reduces dimensions and removes correlations.
ZCA Whitening: Retains original data structure while standardising.
Cholesky Whitening: Uses matrix decomposition for transformation.

Each approach suits specific data pre-processing needs in machine learning workflows.

Categories of Zero-Centring and Whitening

Zero-centring and whitening are categorised by application:

Feature-Level Pre-processing: Applied to individual variables in datasets.
Batch-Level Pre-processing: Applied across data batches, common in deep learning pipelines.

Software and Tools for Zero-Centring and Whitening

The following tools are commonly used:

Python Libraries: NumPy, Scikit-learn.
Deep Learning Frameworks: TensorFlow, PyTorch.
MATLAB for Data Analysis: Advanced statistical tools.
R Programming: For statistical computing and pre-processing.

These tools empower data scientists to apply zero-centring and whitening techniques seamlessly.

Industry Applications in Australian Governmental Agencies

Health Data Analysis: The Australian Institute of Health and Welfare uses zero-centring and whitening to normalise health data for public health prediction models.
Environmental Monitoring: Geoscience Australia applies whitening to satellite data, improving accuracy in mapping and resource management.
Traffic Flow Optimisation: Transport for NSW pre-processes traffic data with whitening techniques, optimising road usage and reducing congestion.

How interested are you in uncovering even more about this topic? Our next article dives deeper into [insert next topic], unravelling insights you won’t want to miss. Stay curious and take the next step with us!

Advisory

Training

delivery

NBN - Overcoming Construction Cycle Time

NBN - Reducing Design Validation Cycle Time

SC Johnson - Reducing Material Consumption

NBN - Network Engineering & Security (NES) + Business Process Reengineering (BPR)

Stockland - Robotic Process Automation (RPA)

Asaleo Care - Reducing Consumers Complains

Zero-Centring and Whitening: Standardising Data for Machine Learning

A Brief History of Zero-Centring and Whitening

What Are Zero-Centring and Whitening?

Why Are Zero-Centring and Whitening Used?

How Are Zero-Centring and Whitening Used?

Different Types of Whitening

Categories of Zero-Centring and Whitening

Software and Tools for Zero-Centring and Whitening

Industry Applications in Australian Governmental Agencies

Share:

You may also like

A Very Short Introduction of Generative Gaussian Mixtures

A Very Short Introduction of AdaBoost R2s

Guess What? Your gut might just be quicksilver

Leave A Reply Cancel reply

Recent Posts

Is SMART Really Smart? Why you might be heading in the wrong direction

Why BPMN Monoliths Are Quietly Killing Your Process Agility

UiPath Orchestrator Isn’t Failing You — IIS, SQL Server, Elasticsearch, or Kibana Might Be

Popular Courses

BPMN2

Root Cause Analysis

Predictive Data Analysis

Quick Links

Services

Courses

join our newsletter