20/3/2024

We tested GPT to manage Telecom projects and it disappointed (for now...).

Let me first clarify that this article will probably be obsolete in months, or maybe days! Considering the huge advances from our friends at OpenAI, AWS, Google, etc.

Last week at Sytex we did a Hackweek where we typically take a topic and work on it together. The idea is to learn what a technology allows us to do and try to add it to our product. We have been wanting to see what can be done with ChatGPT/GPT3.5/4 for some time now, so we scheduled it and started working with great expectations.

We came up with ideas of using IA to build reports such as "Which is the site with the least maintenance problems?", or "How many installations do I have to do in Manaus?". We also thought that it could help simplify the search for information, recommend regular activities and several other things.

To our surprise, we very quickly encountered complications that, for the time being, make it difficult to implement IA in an activity management platform such as Sytex. These problems are:

  1. GPT is (very) expensive.
  2. GPT is a reader.
  3. It is not possible to provide context (cheaply).
  4. GPT does not remember anything.

We will now review each of these points.

But first!

How the Open AI API works

GPT is a LLM: Large Language Model, i.e. its intelligence lies in guessing words to answer a question. Guessing consists of a probability game where it considers the question, the context, its training, etc., and proposes text. This is fantastic and incredible, but it is not, for the moment, an intelligence that actually interprets or understands the question one is asking it.

Bottom line, if you ask him to do a story about Messi written by Shakespeare, he looks for words and structures that he typically saw in Shakespeare's texts, and mixes it with texts that are typically a story, about the world's greatest.

The first thing to know about the API is that everything works through text, just like ChatGPT. You don't send it parameters but you must send it a prompt (descriptive text) with what you want it to do. The trick is to put that text together in the best possible way to avoid unwanted behavior. We already know this from ChatGPT, but the API also works in the same way.

On the other hand, it is possible to guide GPT with the specific action we want it to do upon a user's request. For this, a "System Prompt" must be defined that describes what GPT should do with what the user requests. It is another Prompt, but hidden from the user.

For example, a System Prompt would be: "The user will state a set of data he wants to search for and you will have to tell me which Sytex filters to use. The possible filters are x, y, z...",

In response to a customer's request by text, for example:

"I want to see all the tasks completed last week on project X."

GPT will respond to the API with: "The user wants to use the following filters...".

That's why improving the System Prompt is key for GPT to respond exactly as we want. In this case, the System Prompt could be: "The user will send a request for information that he wants to search and you will have to tell me which filters to use in Sytex, WITHOUT EXPLANATION", for which it will answer with "Completion_date = last_week; Project=X".

Here the first alarm, ChatGPT is excellent at putting together new texts, but it does not seem to be ideal for precise results or orders, which is what we typically expect in any work platform.

Now, with the problems:

1- GPT is (very) expensive

Each call to the OpenAI API generates a cost which depends on, among other things:

  • The model you use (gpt3,5, 4, etc.)
  • The amount of information you send
  • Fine tuning, or fine tuning, of the model

This cost is not at all negligible when you have a lot of users. So, we thought, if we are going to use it, let it be for an incredible functionality for our users, one that is really worth the cost.

We asked ChatGPT to explain the cost structure better, using the smaller API call we did these days and he summarized it like this:

First, let's calculate the cost of each type of token in U.S. dollars:

ChatGPT-4 Prompts: 360 tokens * ($0.03 / 1000 tokens) = $0.0108

Completions: 168 tokens * ($0.06 / 1000 tokens) = $0.01008

Text-embedding-ada-002-v2: 22 tokens * ($0.0004 / 1000 tokens) = $0.0000088

Now, let's add up the costs of the three types of tokens:

$0.0108 + $0.01008 + $0.0000088 = $0.0208888

The total cost in U.S. dollars is approximately $0.0209.

‍Thus, the ChatGPT-4 API query that consumed 550 tokens with the specified costs represents approximately $0.0209 in U.S. dollars.

2- GPT is slooooow

Each GPT4 API call took us between 7 and 15 seconds. With GPT3.5 the times are substantially improved, but the results are less accurate.

If you need the AI to optimize processes or do some critical task, you may be willing to wait that long, but if it is just an accessory functionality, such as activating search filters, you may choose to do it yourself.

3- It is not possible to provide context (cheaply).

This was, for me, the point that disappointed me the most. You expected the model to learn from your own data or processes to propose activities or information in a precise and integral way, but no. You either send it a very long System Prompt, which is very expensive, or you do what is called "Fine Tunning" of the model. Either you send it a very long System Prompt, which is super expensive, or you do what is called "Fine Tunning" of the model.

Fine Tunning is teaching the model answers that you expect to receive when you ask it a question. There the model learns and responds with more of "your" context.

The problem is that to do a good fine tuning you need to send a lot of data, and this, besides having its cost, makes each query more expensive!

That is, for using your own "finetuned" model, GPT charges you more.

Ergo, the idea of "We want users to ask which team has the highest workload" is not possible, since we would have to send them so much information that it would be unfeasible at the cost level. It is better to program configurable reports where you can filter and assemble what you need, without AI.

This is where there will be more news in the next days/weeks/months. GPT-4 supports 30K tokens, which is quite a lot for the contexts we were testing. The problem (see is that those context tokens are priced, and well. But we would also expect the price to come down as it gains scale, and more efficient generation techniques are found.

4-GPT does not remember anything

A la Memento, each new interaction with GPT is a new beginning. It doesn't remember what was done for the user, or other things the same model has already done for you.

Each new request is a new adventure, and it loses value if a user has to start from scratch for each use. One option, of course, would be to send in each new query all the comings and goings that the user had with the API, which is possible, it would work, but we have already seen that it would be very expensive!

Some conclusions

GPT is an incredible and magical tool that helps us to think, to generate texts (I wrote this one by myself!), to develop ideas and much, much more.

Perhaps this LLM approach is not the one that solves the need for consistent, context-specific, industry- or company-specific results. Or perhaps the ecosystem will continue to solve these issues and the next models will be faster, more efficient and accurate.

For us to start seeing IA collaborating with our own information I think it will be a while yet. How long? years? months? days? we don't know. What we do know is that AI is here to change the world and together we will see how to make it empower us and improve our lives.

Now that we have some experience having used and better understanding the capabilities and limitations of the tool, let's look for those use cases where it could be a useful Sytex feature for our users.