While the emergence of cloud services meant that organisations of all sizes now had access to powerful computing resources, many hasty and poorly planned migrations meant that they had not put the proper data infrastructure in place from the outset.

By Keagan Jordan, technical engagement lead: data, analytics, and AI and software engineering at Altron Karabina

Without this solid foundation, they are unable to harness the full capability of the cloud, including making the most of technologies such as artificial intelligence.

Previously, organisations wanting to use artificial intelligence had to invest significantly in on-premises server infrastructure, setting up data warehouses and putting in place the teams of people that were needed to make sure the system was running and that all the needed data was available to business users.

However, with the advent of cloud computing, gaining access to such capability was as simple as having one or two skilled people who could easily set up a barebones data warehouse in the cloud. After all, the initial requirements from the business were just for basic reporting and insights.

But then, businesses wanted much more, including capabilities such as artificial intelligence, only to realise that they still had a long journey ahead, including having to lay the engineering foundations and change the way in which they approach the processing of data, including ensuring that it is in the correct structure.

Then, there was the realisation that implementing a data lakehouse – which can store vast amounts of both structured and unstructured data as compared to a more limited data warehouse – could play a significant role in moving them along their data and AI journey.

Change in data processing approach

Here, the biggest shift has been a move from an ETL (extract, transform and load) approach to the ELT (extract, load and transform) approach, which ensures that data in its most raw form is available to work with afterwards. The ELT approach gained in popularity due to IT infrastructure limitations, but with access to the power and scalability of the cloud, organisations can now make use of raw data in order to build generative and predictive models.

Traditionally, developers would want to start working with data as soon as they had access to it, by creating reports, building models and much more, but the landscape has changed. Now, there is a requirement for data to be loaded in a structured format before you can do anything with it. If one looks at the Medallion Architecture coined by Databricks, there is gold, silver and bronze, with bronze being absolute raw data that comes from a source system.

However, it is not as simple as just dumping the data from the business systems, there needs to be a structure in place so that data can be easily found.

One can think of this as a fork in the road – one side toward analytics, insights and day-to-day business reporting and the other side toward predictive models and AI. But, if the initial road isn’t paved, organisations are not even going to get to the fork. Both roads are built off the same data, but this data needs to be loaded in a way that it can be used in both ecosystems.

Consolidation of data sources is crucial

This is just the first step though; just because there is raw data doesn’t mean that it should be used immediately. Organisations still need to get to the next step of transforming their data, while also ensuring that their data lakehouses also bring in information from all other business systems, such as those from finance, sales, HR and more, in order to provide them with a holistic view. After all, how can a business build a predictive model if it can’t see what the month-to-month figures were?

This in itself is not new, as business intelligence and data analytics have always been seen as the consolidation of all of a business’s data in a single place so that reports queried by business leaders are accurate and up to date. This enables them to get answers to multiple questions without having to query individual departments for updates.

When time is of the essence, decision-makers don’t want to have to go to another application or have to get light training on how to use various business systems should they require more information – they just want a single platform that will be able to give them the answers to the questions that they have.

Using technology correctly

When looking at business use cases of AI, the first of which has captured much of the world’s attention over this past year, and is using generative AI models to power applications such as ChatGPT, which allow people to ask questions in a natural language and get the right answers, including related to specific areas of business performance. Then, there is predictive analytics, where data is analysed in order to identify trends and then predict what is likely to happen in future.

This is especially useful when a business is looking to make significant investments in improvements and wants to first determine whether the right conditions (supply, demand, etc.) exist to ensure return on investment. For example, a fishery company might want to invest in a new fleet, and AI can be used to harness data from current fishing expeditions in order to predict what will happen to the fish population over time, and whether there will be enough fish over the coming years to justify the spend.

Getting this right, however, requires that the first step on their AI journey is well-architected and well-implemented before they can move on to predictive analytics. This means that organisations often do not see any immediate return on investment in the technology, but this is something that CIOs, CTOs and IT departments understand.

Taking the extra effort and time to lay the foundations and make sure that IT infrastructure and data structures are correct will enable those organisations to progress by harnessing more use cases for this technology as they come up in future.