The road to responsible AI is paved with data efficiency

Discussions surrounding the responsible adoption of AI are increasingly taking centre stage in boardrooms across Africa. A significant number of CEOs are grappling with the ethical complexities inherent in implementing this powerful technology. One pressing issue increasingly under debate is the environmental footprint of AI.

By Michael Langeveld, head of technology and business development at HPE Emirates and Africa

As we witness the escalating impact of climate change, with more frequent and intense adverse weather events sweeping across the continent, the call to prioritise environmental initiatives has never been greater. As it stands, however, a mere 32% of African leaders are confident that their companies will meet their net zero commitments.

In light of this, the substantial energy consumption of AI workloads has become a growing concern for many. To put it in perspective, AI models rely on incredibly powerful computers that are voracious consumers of electricity.

A typical AI-centric data centre can burn through as much electricity as 100 000 households. Even more worrying, the largest data centres under construction today are projected to use 20-times that amount. Given that most of our electricity generation still hinges on greenhouse gas-emitting sources, the unchecked rise of AI could significantly amplify carbon emissions. This is a particular consideration in South Africa, where businesses continue to rely heavily on fossil fuels.

While navigating this complex issue might seem daunting, HPE has created a systematic approach to help organisations avoid getting bogged down in the many variables of the AI sustainability conundrum. It breaks down AI sustainability into five key areas: equipment efficiency, energy efficiency, resource efficiency, software efficiency, and data efficiency. Essentially, it’s about getting more from less—maximising system efficacy to achieve greater output with fewer resources.

While each of these areas is crucial, data efficiency is generally a good place to begin. Given how data-intensive AI workloads are, optimising the data sets fed into AI models can have a profound impact.

Around three quarters of African CEOs are not confident about having their data prepared for the safe and effective use of generative AI, so this is a particularly important point of focus for local leaders. Addressing data efficiency as a priority can set a solid foundation for broader AI sustainability efforts.

First steps to data efficiency

Map out your data strategy upfront: Start with knowing what data you need, where it will come from, how often you’ll collect it, the process you’ll use to gain insights from it (for example, which AI models you’ll use), how data will be moved between systems, where and how long you’ll store it. Can data be consolidated, disposed of, or stored using low-impact techniques, such as tape or other backup methods? Data that does not need to be retrieved immediately can often be offloaded to more low-energy media.
Clean up before you start: For traditional workloads, data efficiency used to mean we focused on only storing data that we were going to use to generate business value. But for AI workloads, data sets need to be adequately sized, adequately cleaned up and optimised BEFORE training a model — because when you simply use off-the-shelf data sets or repositories without minimising them before model tuning, you end up doing unnecessary work and making the AI solution work harder.
Get the training data set right: Getting the data set optimised in the first place, before you do the training, is a key part of AI sustainability, and then you can use your customer’s specific data as you tune that model as well. By starting first with the data efficiency — and getting that data population as concise as it can be from the first early stages of the process — then you’re driving efficiency all the way through.
Process data only once: Data used for training/tuning should be processed only once, with additional retraining/fine tuning happening on only the new data being collected.
Avoid data debt: Managing and maintaining data becomes especially critical with AI workloads because they require such massive amounts of data, including unstructured data. One way to ease the pressure on data storage systems is to get rid of inaccurate, erroneous, out of date, or duplicated data. Data debt, like technical debt, becomes problematic in AI systems because AI results hinge on the data fed into the models.
Location matters: Data should be processed as close to the original location as possible to minimise the energy implications of movement and the timeliness of the information.