Developers now have a new AI-powered steering wheel to help them hug the road while they drive powerful large language models (LLMs) to their desired locations with the Nvidia NeMo SteerLM which lets companies define knobs to dial in a model’s responses as it’s running in production – a process called inference.
Unlike current methods for customising an LLM, it lets a single training run create one model that can serve dozens or even hundreds of use cases saving time and money.
Nvidia researchers created SteerLM to teach AI models what users care about, like road signs to follow in their particular use cases or markets. These user-defined attributes can gauge almost anything – for example, the degree of helpfulness or humour in the model’s responses.
The result is a new level of flexibility.
With SteerLM, users define all the attributes they want and embed them in a single model. Then they can choose the combination they need for a given use case while the model is running.
For example, a custom model can now be tuned during inference to the unique needs of, say, an accounting, sales, or engineering department or a vertical market.
The method also enables a continuous improvement cycle. Responses from a custom model can serve as data for a future training run that dials the model into new levels of usefulness.
To date, fitting a generative AI model to the needs of a specific application has been the equivalent of rebuilding an engine’s transmission. Developers had to painstakingly label datasets, write lots of new code, adjust the hyperparameters under the hood of the neural network, and retrain the model several times.
SteerLM replaces those complex, time-consuming processes with three simple steps:
* Using a basic set of prompts, responses and desired attributes, customise an AI model that predicts how those attributes will perform.
* Automatically generating a dataset using this model.
* Training the model with the dataset using standard supervised fine-tuning techniques.
Developers can adapt SteerLM to nearly any enterprise use case that requires generating text.
With SteerLM, a company might produce a single chatbot it can tailor in realtime to customers’ changing attitudes, demographics, or circumstances in the many vertical markets or geographies it serves.
SteerLM also enables a single LLM to act as a flexible writing co-pilot for an entire corporation.
For example, lawyers can modify their model during inference to adopt a formal style for their legal communications. Or marketing staff can dial in a more conversational style for their audience.
To show the potential of SteerLM, Nvidia demonstrated it on one of its classic applications: gaming.
Today, some games pack dozens of non-playable characters – characters that the player can’t control – which mechanically repeat prerecorded text, regardless of the user or situation.
SteerLM makes these characters come alive, responding with more personality and emotion to players’ prompts. It’s a tool game developers can use to unlock unique new experiences for every player.
The concept behind the new method arrived unexpectedly.
“I woke up early one morning with this idea, so I jumped up and wrote it down,” says Yi Dong, an applied research scientist at Nvidia who initiated the work on SteerLM.
While building a prototype, he realised a popular model-conditioning technique could also be part of the method. Once all the pieces came together and his experiment worked, the team helped articulate the method in four simple steps.
It’s the latest advance in model customisation, a hot area in AI research.
“It’s a challenging field – a kind of holy grail for making AI more closely reflect a human perspective – and I love a new challenge,” says Dong, who earned a PhD in computational neuroscience at Johns Hopkins University then worked on machine learning algorithms in finance before joining Nvidia.