Just being honest, I sometimes suffer from engineering arrogance. I’ll look at software that another company has built and think “that’s not that hard.” Well, nothing close to that thought crossed my mind after playing around with ChatGPT. As someone who has spent the last 12 years working on Natural Language Generation (NLG) technologies, I found its writing capabilities astonishing. If there was an engineering Nobel Prize, the OpenAI team should win.
That said, ChatGPT has not (thankfully) actually achieved General Artificial Intelligence yet. There are several areas where it struggles compared to a smart human being, particularly when it comes to analyzing and reporting on numeric data sets. Conveniently for me, many of the areas where ChatGPT is weak are the same areas where the technology I have been working on is strong. In this post, I’m going to run through some of the weaknesses of ChatGPT when dealing with data, and also talk about an alternative software method that successfully deals with those challenges.
Problem #1: It's a Black Box that Makes Guesses
You want your reports to be accurate, and if they’re not accurate then you want to know why. Unfortunately, the technology underlying ChatGPT doesn’t allow for either consistent accuracy or easy debugging. While this is an oversimplification, ChatGPT essentially creates digital neurons that each try to understand some component of reading and writing text. When writing, these neurons collectively come up with a probability of which words to use at any given point in creating a written document. This means that if it is analyzing your data and writing a report, it might ‘guess’ wrong when trying to interpret or describe what’s going on. Small, subtle changes to the data could be enough to make it go down the wrong pathway. Unfortunately, reports that are 98% accurate are usually not good enough.
When ChatGPT does go wrong, the sheer complexity of its underlying technology makes it very hard to figure out why it failed. There are billions of neurons (175 billion in the case of ChatGPT4) involved in making ChatGPT run, and because they are potentially all involved in deciding each word in a report, there is no way to succinctly describe the path the computer took when it went from a blank page to a completed report. Asking ChatGPT to fully describe its thought process is akin to asking you how you processed the photons coming into your eyes. The mechanism is hopelessly opaque, even to the system that is doing the processing.
Human thinking is different. We start with well-defined concepts (“up”, “week”, “revenue”) and then mix them together to form new concepts (“revenue was up for the week”). We can then manipulate these concepts using logical operations and continue to combine concepts into bigger structures (like complex thoughts or written paragraphs). Crucially, we can apply this process of conceptual thinking to our own thought process, giving us the ability to explain how we came to a conclusion.
There is a way to mimic human-style thinking in software by using conceptual automata. These are pre-defined concepts that exist within an ontology and can be combined with each other to form larger concepts. Because they are not probabilistic, they always follow the same pathways when analyzing data, making sure that their final analysis is 100% accurate. Using sophisticated debugging tools, each of these pathways can be made visible to a narrative engineer, so they can very quickly determine exactly why any given sentence, phrase, or number appeared in a narrative.
Problem #2: Struggling with Logical Operations
ChatGPT can play chess, despite having never seen a chess board. It’s actually not terrible, especially in the opening. While that certainly would make it seem like it can handle logical thinking, it’s really an illusion. ChatCPT is essentially a super-sophisticated auto-complete system. So, if you start off by asking the system for a chess move that comes after 1.e4, it might respond with 1. e5. That’s not because it understands the value of moving your pawn forward, but rather because it has read through the annotations of millions of chess games and knows that e5 often follows e4.
For as long as you play ‘book moves’ (those typically played in a chess opening) ChatGPT will keep humming along great. But once you get to the ‘middle game’, where you are now playing a unique contest, ChatGPT will start to struggle. Sometimes, it will even suggest making an illegal move, like moving your own piece on top of another of your pieces.
This is a problem when dealing with your data. While there are elements of your data that are not unique, the totality of information contained within your data sets creates a never-before-seen analysis question. Essentially, it’s one big ‘middle game’ in chess, where you can’t follow hard and fast rules anymore and instead have to rely on real logical thinking.
A Conceptual Automata System (CAS) solves this problem by incorporating logical operations directly into the foundation of the software. As I mentioned before, conceptual automata work by allowing multiple concepts to be combined into larger concepts. However, there really isn’t a bright line difference between what we might refer to as a ‘tangible’ concept, such as ‘revenue’ and a logical operation concept such as ‘last week’ or ‘double’. Therefore, when the CAS applies a logical transformation, such as changing ‘revenue’ to ‘revenue last week’, it simply creates a new concept that combines the tangible concept of revenue with the logical operation of moving the time period back one week.
Human beings are proof that the potential ways of combining tangible and logical concepts together are near infinite, as we can offer an analysis of almost any situation. While a CAS is not currently as flexible in its domain knowledge as ChatGPT, within an area that it has expertise it can mix together concepts with human-like fluidity. Because it understands all of the sub-components involved in creating larger scale concepts, it maintains a fundamental understanding of its results, giving it the ability to then write about it intelligently.
Problem #3: Not Adapting to New Information
ChatGPT has gobbled up fantastic amounts of written material. In fact, ChatGPT has essentially ingested every piece of written material available on the internet, meaning every blog post, article, and Tweet, along with every book ever written. It needs that massive amount of scale precisely because it doesn’t think conceptually like human beings do. When we learn a new thing, we typically try to fit it into already existing concepts and then understand how the new thing is different. For example, if you had never heard of soccer but knew all about hockey, you would pretty quickly be able to understand the dynamics of the game by mapping the new soccer concepts on top of the ones you had for hockey. ChatGPT, on the other hand, derives something akin to a concept by looking at the interactions of massive amounts of information. These ‘quasi concepts’ can’t really be manipulated or merged with new information entering the system, as they can only be built by looking at an entire training set at one time.
It takes a long time to go through the entire history of written content, so ChatGPT is trained over a set time period (usually several months) and then its model is fixed from that point forward. It might seem like it is adapting or learning when you chat with it, but in fact it is merely responding to you by applying some already existing aspect of its model to what you are saying. It cannot create new concepts until the entire model is retrained.
This is a problem, because one thing that it hasn’t ingested (hopefully) is your internal reporting, or the internal reporting of any of your competitors. This means ChatGPT would be approaching your data, and your reporting needs and preferences, from anew. It could try to apply already existing concepts it had derived to your data, but it wouldn’t be able to create any new conceptual information. It would therefore struggle to incorporate feedback from you as to what to look for in the data, how to weight the significance of different events, how to use jargon or unique metric names, and many other aspects to reporting on your data that require new knowledge.
Not There Yet
ChatGPT is taking the world by storm, and for good reason. Its ability to understand written text and communicate on a human level is truly astounding, and marks a significant change in the history of human technology. That said, when it comes to writing reports based on data, it currently has the potential to make significant errors, is difficult to debug, struggles with logical operations, and has a hard time incorporating new information after it has been trained. Any of these by themselves would hamper ChatGPT’s ability to reliably analyze data and report on it. Taken collectively, they completely prevent ChatGPT from playing a significant role in automating data-based reporting in the near term.
In contrast, CAS can deliver 100% accuracy, is easy to debug, can handle all basic logical operations (changing time spans, creating sets, arithmetic, etc.), and can be fairly quickly trained to report on data of any kind. For now at least, this makes it the ideal solution for automated reporting. Given that CAS is strong where ChatGPT is weak and vice versa, could merging the two technologies provide an even more powerful solution, and perhaps even get us closer to General AI? My answer to the question is probably (!), so keep a look out for future blog posts on that topic.
I think you can generalize this with basically anything ChatGPT (or Bard) kicks out right now. "That said, when it comes to writing reports based on data, it currently has the potential to make significant errors, is difficult to debug, struggles with logical operations, and has a hard time incorporating new information after it has been trained."
After trying to see if it can compete with me on writing certain technical emails based on public data, crafting up Java code and other "mundane" details, it gets a lot right but even a small small error makes the difference between it being helpful information and fantasy.
I have been enjoying reading ChatGPT's fantasy and also seeing the effect those fantastical statements can have on people who don't know that it's just informed guessing, however slightly wrong and sometimes right it might be.
Brilliant dissection of how and why InfoSentience's template based generative models are orthogonal to LLM's and critical for so many important use cases.