The LLM Capacity Index : One Index to Rule Them All and Comprehensively Evaluate an LLM’s Performance in the Telco Domain

Intro

Benchmarking Large Language Models (LLMs) is a tricky endeavor. There has been a lot of focus on model performance, but results on standard benchmark suites tend to have very little correlation with real-world use cases. Additionally, aside from performance, there are other factors (especially cost and latency) that need to be deeply considered when applying and adopting LLMs to real-world use cases, services, and products. To paint a more holistic picture on metrics that are truly important to business cases, we have created the LLM Capacity Index. This Index measures not only LLM performance, but also weighs model speed, cost, efficiency, and time to market.

 

Performance

LLM results on standard benchmarking suites, e.g., Massive Multitask Language Understanding) are informative in terms of ranking models but tend to have little correlation with real-world model performance in a business setting. Business use cases tend to focus on narrower domains, such as customer service, and the intricacies of such domains are not reflected in standard benchmarks. There have been some efforts to create domain-specific benchmarks, such as LegalBench, but there is no industry-standard for the Telco domain.

We aim to address such short-comings with our proposed performance metric we’ve deemed the Telco Expertise Score (TES). TES is a score encompassing an LLM’s capabilities on important Telco tasks. The first version of TES covers these 5 important tasks:

  • Conversation Summarization - Ability to summarize Telco call logs

  • Intent Recognition - Ability to classify intents (user needs) in the Telco domain, e.g., subscribe to roaming plan

  • Topic Generation - Ability to generate Telco related topics, e.g., Roaming Plans

  • Planning - Ability to break a user inquiry into sub-tasks w/ clearly defined actions

  • Response Generation - Ability to generate responses with Telco jargon and specific to the Telco domain

The second version of TES will be expanded to include the following tasks (10 tasks total):

  • Machine Reading Comprehension - Ability to understand and answer questions about long documents from the Telco domain

  • Safety - Ability to not hallucinate products and services

  • Expert Q&A - Ability to answer questions about Telco jargon, products, and services

  • Tool Use - Ability to search for documents and call Telco-related APIs

  • EQ - Ability to understand complex emotions and social interactions in Telco conversations

 

The Holy Trinity

While performance is important, in real-world settings, what we’ve deemed as the holy trinity (speed, cost, and efficiency) are equally as important. It’s important to take these factors into account as well to provide the best user experience and get the maximum ROI from a business perspective.


Speed

There are always trade-offs between performance and ‘speed’, but end users typically don’t want to wait a long time for responses from a model. LLM inferencing has never been blazingly fast, although inference speeds continue to improve with innovations like FlashAttention. Additionally, model providers typically provide streaming responses, so users can receive results incrementally. This is still largely an un-resolved issue, however, so it’s an important focus when looking at LLM performance holistically.

For us, ‘speed’ consists of 5 aspects: Total Response Time (TRT), Latency, Throughput, Performance versus Throughput, and Throughput versus Latency. There are no real industry standards, but we’ve defined TRT, Latency, and Throughput as follows:

  • TRT: Amount of time (seconds) to output 100 tokens

  • Latency: Time (seconds) to first token chunk received

  • Throughput: Tokens per second

Ideally models have a low TRT, low Latency, and high Throughput. 

Cost

Cost is another determining factor in determining LLM performance. Bigger proprietary models typically have the best performance on tasks, but this comes at a more expensive price point. It’s important to optimize model task performance, speed, and price in real-world use cases.

Proprietary models typically have different prices for input and output tokens. This makes comparisons difficult, so we’re using a combined cost. We’ve seen that in real-world scenarios, the ratio of input to output tokens is around 3 to 1. As such, we calculate cost using this ratio and define it as:

  • Cost: USD per 1 million tokens

This number is interesting in itself, but to make an informed decision, it’s also important to take LLM task performance and throughput into consideration. As such, we look at Performance versus Cost and Throughput versus Cost. To drive ROI in real-world use cases, it is important to have as low of cost as possible while maximizing task Performance and Throughput.

Efficiency

By tuning the LLM for the Telco domain, our main aim is to improve model performance. That being said, adapting the LLM to the Telco domain also comes with two additional benefits: inference time and cost. Since fine-tuned LMs do not require few-shot in-context examples, inferencing becomes more efficient, especially in complex applications with iterative calls to the LLM to satisfy user inquiries. Additionally, the smaller context adds up over time in terms of cost, as cost is dependent on input and output tokens. Tuning the model obviates the need for lengthy system prompts and few-shot examples, thereby reducing input context. Additionally, tuning the model leads to generation of more specific responses, which reduces the number of output tokens. The reduction in input and output tokens leads to an overall reduction in cost.

Time to Market

One final consideration when holistically benchmarking models is time to market (TTM). This process is time-consuming and can take up to a year. It is also expensive, as it requires a lot of in-house expertise, including UX and ML experts. LLMs have made the process of shipping a service or product easier and less time-consuming, but there is still a lot of effort in terms of resources (person power and expenses) needed to deliver a fantastic experience.

This is one of the true benefits of adapting a LLM to the Telco domain. Tuning the model not only improves performance in terms of tone, style, and manner. It also allows us to “bake in” domain expertise, including business logic. This means that when you develop an application or service, you are no longer starting at the ground floor. Thanks to the Telco LLM’s expertise and knowledge of the Telco domain, you get a head start, and the Telco LLM acts as an accelerator to get apps and services to production faster and more efficiently.

Improving Development Speed of LLM-based Services

In developing apps and services, we’ve seen up to a 50% reduction in time to go from service idea to launching an user-facing service. What used to take 6 months can now be done in as short of a time period as 3 months!

Conclusion

There are many LLMs in the market, and new models come out on a daily basis. Typically benchmarks tend to only look at performance on a narrow list of tasks. These results, unfortunately, have very little correlation with LLM performance in real-world settings. Additionally, for business cases, it’s important to look at not only LLM performance on tasks but also speed, cost, and time to market. To help Telcos get a more holistic view of LLM performance, we’ve created the LLM Capacity Index, which includes the TES as well as metrics around speed and cost. We hope this framework can help Telcos make informed decisions about the right LLM(s) for business use cases.

Like Button Example
Previous
Previous

Red Alert: Tackling Trust and Safety in Telco Customer Service

Next
Next

Unlocking Insights: TelBench's Role in Advancing Telecommunications