GSMA launches Open-Telco LLM Benchmarks

The GSMA Foundry has launched GSMA Open-Telco LLM Benchmarks, an open-source community aimed at improving the performance of large language models (LLMs) for telecom-specific applications.

The community provides an industry-first framework for evaluating artificial intelligence (AI) models in real-world telecom use cases and is supported at launch by Hugging Face, Khalifa University, The Linux Foundation and a host of mobile network operators and vendors.

As AI adoption in telecoms accelerates, LLMs have demonstrated significant shortcomings in handling technical telecom knowledge, regulatory compliance and network troubleshooting.

In recent tests, GPT4 scored less than 75% on TeleQnA, a comprehensive dataset tailored to assess the knowledge of LLMs in the field of telecommunications, and less than 40% on 3GPPTdocs Classification⁴, a dataset based on 3GPP standards documentation. Microsoft’s Phi2, a much smaller model, scored only 10% on MATH500, a benchmark of 500 general maths questions.

These results highlight the current limitations of AI models in addressing telecom-specific queries. GSMA Open-Telco LLM Benchmarks will address these gaps by providing transparent, open evaluations of AI models across capabilities, energy efficiency and safety.

“Today’s AI models struggle with telecom-specific queries, often producing inaccurate, misleading or impractical recommendations,” says Louis Powell, head of AI initiatives at GSMA. “By creating an industry-wide set of benchmarks, we’re not only improving model performance but also ensuring AI in telecoms is safe, reliable and aligned with real-world operational needs.”

The mobile network operators supporting the launch of GSMA Open-Telco LLM Benchmarks include Deutsche Telekom, LG Uplus, SK Telecom and Turkcell and technology vendor, Huawei.

The GSMA Open-Telco LLM Benchmarks community enables mobile network operators, AI researchers and developers to submit use cases, datasets and models for evaluation. A standardised benchmarking framework ensures that all AI models are evaluated against real-world challenges in areas such as telecoms domain knowledge, mathematical reasoning, energy consumption and safety.

The resulting benchmarks will be hosted on Hugging Face to ensure transparency and encourage community engagement.

Mobile network operators, vendors, startups and researchers are encouraged to contribute, by submitting interest and LLM telcos use cases.

Jeff Boudier, head of product and growth at Hugging Face, comments: “Hugging Face is the leading open platform for AI builders, and we’re thrilled to support and host the GSMA Open-Telco LLM Benchmarks to advance telecoms AI adoption and innovation.”

According to Professor Merouane Debbah, director of the 6G Research Centre at Khalifa University: “Academia plays a crucial role in advancing AI for telecommunications by ensuring rigorous benchmarking and scientific integrity. At Khalifa University, we are proud to support the GSMA Open-Telco LLM Benchmarks initiative. This effort will drive innovation and enhance the reliability of AI models in real-world telecom applications.”

Sangyeob Lee, chief technology officer of LG Uplus, says: “We stand at a turning point, heading for human and AI agent coexistence, and the telcos will play a vital role in establishing safe, autonomous connections between them. LG Uplus is committed to this AI agent innovation through LLM advancement and welcomes GSMA Open-Telco LLM Benchmarks as our guiding light towards the assured intelligence services that we pursue.”

“The launch of GSMA’s Open-Telco LLM Benchmarks marks a significant milestone in advancing AI adoption across the telecom industry,” says Arpit Josphipra, GM: networking, edge and IoT at The Linux Foundation. “By establishing open, standardised benchmarks, this initiative brings much-needed transparency and performance insights, enabling operators and ecosystem partners to deploy domain-specific AI with confidence. The Linux Foundation supports this effort, as it aligns with our vision of open collaboration to drive innovation and efficiency in telecom networks worldwide.”

Eric Davis, head of the AI tech collaboration office at SK Telecom adds: “The introduction of GSMA Open-Telco LLM Benchmarks marks a pivotal milestone for the telecommunications industry in its pursuit of tangible AI benefits. By establishing a standardised evaluation framework, we’re simultaneously driving innovation and ensuring AI solutions deliver the robustness, reliability and precision that our rapidly evolving sector demands.”

The launch follows last year’s industry-wide commitment to exploring telco AI use cases ethically and sustainably, central to which was the GSMA’s Responsible AI Maturity Roadmap, which helps MNOs ensure best-practice principles are applied from inception through evolution.