Eighty percent of enterprise software and applications will be multimodal by 2030, up from less than 10% in 2024, according to Gartner.

“The shift to multimodal enterprise software is a fundamental transformation in business operations and innovation,” says Roberta Cozza, senior director analyst at Gartner. “Multimodal generative AI (GenAI) will revolutionize enterprise applications by adding previously unattainable features and functionalities, impacting sectors like healthcare, finance, and manufacturing.

“By enhancing domain-specific language models, it will improve accuracy, automate operations, and drive contextual decision intelligence, enabling AI to take proactive actions across tasks.”

High-impact technologies such as multimodal GenAI models are at the center of Gartner’s Emerging Tech Impact Radar for GenAI. Product leaders will have to make critical decisions on investing in these emerging GenAI technologies to enable customers to reach new heights of value in their business.

Multimodal GenAI provides the ability to use multiple types of data inputs and outputs, such as images, videos, audio (speech), text and numerical data, within a single generative model. Multimodality augments the usability of GenAI by allowing models to interact with and create outputs across data in various modalities.

Today, many multimodal models offer processing across two or three modalities (e.g., text-to-video or speech-to-image). This will increase over the next few years to include more diverse and new modalities.

“Enterprises should focus on integrating multimodal capabilities into their software to enhance user experiences and operational efficiency. By leveraging the diverse data inputs and outputs that multimodal GenAI offers, businesses can unlock new levels of productivity and innovation,” says Cozza.