Just being honest, I sometimes suffer from engineering arrogance. I’ll look at software that another company has built and think “that’s not that hard.” Well, nothing close to that thought crossed my mind after playing around with ChatGPT. As someone who has spent the last 12 years working on Natural Language Generation (NLG) technologies, I found its writing capabilities astonishing. If there was an engineering Nobel Prize, the OpenAI team should win.
That said, ChatGPT has not (thankfully) actually achieved General Artificial Intelligence yet. There are several areas where it struggles compared to a smart human being, particularly when it comes to analyzing and reporting on numeric data sets. Conveniently for me, many of the areas where ChatGPT is weak are the same areas where the technology I have been working on is strong. In this post, I’m going to run through some of the weaknesses of ChatGPT when dealing with data, and also talk about an alternative software method that successfully deals with those challenges.
Problem #1: It's a Black Box that Makes Guesses
You want your reports to be accurate, and if they’re not accurate then you want to know why. Unfortunately, the technology underlying ChatGPT doesn’t allow for either consistent accuracy or easy debugging. While this is an oversimplification, ChatGPT essentially creates digital neurons that each try to understand some component of reading and writing text. When writing, these neurons collectively come up with a probability of which words to use at any given point in creating a written document. This means that if it is analyzing your data and writing a report, it might ‘guess’ wrong when trying to interpret or describe what’s going on. Small, subtle changes to the data could be enough to make it go down the wrong pathway. Unfortunately, reports that are 98% accurate are usually not good enough.
When ChatGPT does go wrong, the sheer complexity of its underlying technology makes it very hard to figure out why it failed. There are billions of neurons (175 billion in the case of ChatGPT4) involved in making ChatGPT run, and because they are potentially all involved in deciding each word in a report, there is no way to succinctly describe the path the computer took when it went from a blank page to a completed report. Asking ChatGPT to fully describe its thought process is akin to asking you how you processed the photons coming into your eyes. The mechanism is hopelessly opaque, even to the system that is doing the processing.
Human thinking is different. We start with well-defined concepts (“up”, “week”, “revenue”) and then mix them together to form new concepts (“revenue was up for the week”). We can then manipulate these concepts using logical operations and continue to combine concepts into bigger structures (like complex thoughts or written paragraphs). Crucially, we can apply this process of conceptual thinking to our own thought process, giving us the ability to explain how we came to a conclusion.
There is a way to mimic human-style thinking in software by using conceptual automata. These are pre-defined concepts that exist within an ontology and can be combined with each other to form larger concepts. Because they are not probabilistic, they always follow the same pathways when analyzing data, making sure that their final analysis is 100% accurate. Using sophisticated debugging tools, each of these pathways can be made visible to a narrative engineer, so they can very quickly determine exactly why any given sentence, phrase, or number appeared in a narrative.
Problem #2: Struggling with Logical Operations
ChatGPT can play chess, despite having never seen a chess board. It’s actually not terrible, especially in the opening. While that certainly would make it seem like it can handle logical thinking, it’s really an illusion. ChatCPT is essentially a super-sophisticated auto-complete system. So, if you start off by asking the system for a chess move that comes after 1.e4, it might respond with 1. e5. That’s not because it understands the value of moving your pawn forward, but rather because it has read through the annotations of millions of chess games and knows that e5 often follows e4.
For as long as you play ‘book moves’ (those typically played in a chess opening) ChatGPT will keep humming along great. But once you get to the ‘middle game’, where you are now playing a unique contest, ChatGPT will start to struggle. Sometimes, it will even suggest making an illegal move, like moving your own piece on top of another of your pieces.
This is a problem when dealing with your data. While there are elements of your data that are not unique, the totality of information contained within your data sets creates a never-before-seen analysis question. Essentially, it’s one big ‘middle game’ in chess, where you can’t follow hard and fast rules anymore and instead have to rely on real logical thinking.
A Conceptual Automata System (CAS) solves this problem by incorporating logical operations directly into the foundation of the software. As I mentioned before, conceptual automata work by allowing multiple concepts to be combined into larger concepts. However, there really isn’t a bright line difference between what we might refer to as a ‘tangible’ concept, such as ‘revenue’ and a logical operation concept such as ‘last week’ or ‘double’. Therefore, when the CAS applies a logical transformation, such as changing ‘revenue’ to ‘revenue last week’, it simply creates a new concept that combines the tangible concept of revenue with the logical operation of moving the time period back one week.
Human beings are proof that the potential ways of combining tangible and logical concepts together are near infinite, as we can offer an analysis of almost any situation. While a CAS is not currently as flexible in its domain knowledge as ChatGPT, within an area that it has expertise it can mix together concepts with human-like fluidity. Because it understands all of the sub-components involved in creating larger scale concepts, it maintains a fundamental understanding of its results, giving it the ability to then write about it intelligently.
Problem #3: Not Adapting to New Information
ChatGPT has gobbled up fantastic amounts of written material. In fact, ChatGPT has essentially ingested every piece of written material available on the internet, meaning every blog post, article, and Tweet, along with every book ever written. It needs that massive amount of scale precisely because it doesn’t think conceptually like human beings do. When we learn a new thing, we typically try to fit it into already existing concepts and then understand how the new thing is different. For example, if you had never heard of soccer but knew all about hockey, you would pretty quickly be able to understand the dynamics of the game by mapping the new soccer concepts on top of the ones you had for hockey. ChatGPT, on the other hand, derives something akin to a concept by looking at the interactions of massive amounts of information. These ‘quasi concepts’ can’t really be manipulated or merged with new information entering the system, as they can only be built by looking at an entire training set at one time.
It takes a long time to go through the entire history of written content, so ChatGPT is trained over a set time period (usually several months) and then its model is fixed from that point forward. It might seem like it is adapting or learning when you chat with it, but in fact it is merely responding to you by applying some already existing aspect of its model to what you are saying. It cannot create new concepts until the entire model is retrained.
This is a problem, because one thing that it hasn’t ingested (hopefully) is your internal reporting, or the internal reporting of any of your competitors. This means ChatGPT would be approaching your data, and your reporting needs and preferences, from anew. It could try to apply already existing concepts it had derived to your data, but it wouldn’t be able to create any new conceptual information. It would therefore struggle to incorporate feedback from you as to what to look for in the data, how to weight the significance of different events, how to use jargon or unique metric names, and many other aspects to reporting on your data that require new knowledge.
Not There Yet
ChatGPT is taking the world by storm, and for good reason. Its ability to understand written text and communicate on a human level is truly astounding, and marks a significant change in the history of human technology. That said, when it comes to writing reports based on data, it currently has the potential to make significant errors, is difficult to debug, struggles with logical operations, and has a hard time incorporating new information after it has been trained. Any of these by themselves would hamper ChatGPT’s ability to reliably analyze data and report on it. Taken collectively, they completely prevent ChatGPT from playing a significant role in automating data-based reporting in the near term.
In contrast, CAS can deliver 100% accuracy, is easy to debug, can handle all basic logical operations (changing time spans, creating sets, arithmetic, etc.), and can be fairly quickly trained to report on data of any kind. For now at least, this makes it the ideal solution for automated reporting. Given that CAS is strong where ChatGPT is weak and vice versa, could merging the two technologies provide an even more powerful solution, and perhaps even get us closer to General AI? My answer to the question is probably (!), so keep a look out for future blog posts on that topic.
I’ve written in this blog before about the importance of avoiding cookie-cutter narratives when reporting on data. The two main issues being that: (1) templates are not flexible enough to report on the unique outcomes involved in any data set, and (2) people reading the reports will start to tune them out when they see the same pieces of information in the same arrangement.
The need to have flexible and original narratives applies just as well to how you visualize data. Unfortunately, most data dashboards, such as Tableau or Power BI, are built to have a set of default visualizations relating to the data. For example, a sales report might default to showing gross sales over the past year. But what if ‘gross sales over the past year’ isn’t the big story? For instance, what if the big story was a steep rise in returns over the past three months?
The ideal solution is straightforward- ditch the ‘canned’ visuals and have software automatically highlight the visual information that is most important to the end user. These visuals can be paired with an automated narrative that also automatically surfaces the most important information. Ideally, both the narrative and visual elements of the report will be aligned- quickly allowing the reader to both read and see the things they need to know.
Using Conceptual Automata
This capability is possible if you use automated reporting software that uses conceptual automata. What the heck are conceptual automata? A full breakdown would require a long answer, but at a very high level, conceptual automata break every story or event within a dataset into a set of components. So, for a stock story such as ‘stock on a 5-day streak of beating the market’, the system understands this as the mix of its constituent parts: [streak] [of stock] [beating market] [for five days].
Because the Conceptual Automata System (“CAS”) understands how each sub-component combines to make up the full story, intelligence for how to visualize the story can be placed on the sub-components. This allows the CAS to ‘share’ the intelligence from one story with other stories that have the same sub-components, and also allows the system to visualize any combination of narrative information.
This is similar to how human beings think. After all, if you knew how to visualize a story like ‘stock on a 5-day streak of beating the market’, you would have no problem visualizing the story ‘stock on a 5-day streak of underperforming the market’, or a story like ‘healthcare stock on a 5-day streak of beating the average healthcare stock.’
Human-level flexibility allows a CAS to create charts that cover different time periods and can include multiple subjects. It can understand all the different ways a story could be visualized (line/pie/bar chart, table, etc.) and the benefits and drawbacks of each form factor. Some of those form factors might use more visual space, and the CAS has the capability to understand how to best make use of the available visual space to convey the most information. It might choose to have one large, very visually compelling chart, or two smaller charts, depending on the underlying importance of the information.
Telling the Visual Story
Because the CAS understands the conceptual underpinnings of each visualizer, it can go beyond just showing you the numbers like you’d see in a standard issue dashboard chart. Those charts might show you all the information, but they won’t necessarily make it easy to see why that data is compelling. This is where chart ‘Scribbles’ come into play. ‘Scribbles’ is a catch-all term for human touches that a CAS system can add to a chart to make it easy for the viewer to quickly understand its importance. For example, when a chart is visualizing a metric having moved up in X out of the last Y periods, the chart could highlight the positive movement in green, while showing the negative movement in red. In other situations it might use arrows to point out key data points, or add a trend line to compare movement to.
These additions are not necessary of course, but they help make it that much easier for decisionmakers to easily understand what they need to know. When reporting on data, good writing is all about allowing the reader to spend less effort understanding the report so that the reader can spend more effort thinking about the implications of the report. By holding the viewer's hand as they look at a chart, Scribbles allow a little section of the viewer's brain to relax, freeing up that brainpower to be used for something more valuable.
User Control
The CAS can default to showing the most compelling visuals, but why stop there? After all, the reader is the ultimate judge of what is important, so why not give them the ability to visualize any piece of information within a narrative? That’s exactly what a CAS interactive dashboard is able to do. The CAS can visualize any event within the data, so that allows end users, with the press of a button, to turn any sentence in a narrative into a chart or graph. This not only allows users to quickly visualize information but can also allow them to create charts and graphs to share as a part of a presentation. Alternatively, they can turn a sentence into a table if they want to view the data that the sentence was built from. That allows them to quickly dig into underlying data to understand what sub-components might need more examination.
The Whole Package
Taken collectively, concept-based visuals create a step change in how reports integrate visual information. Instead of pre-set charts and graphs, users see the charts and graphs that highlight the most important information they need to know. Those visualizations are then formatted in such a way that the user can quickly understand the significance of each visual element, whether that might mean highlighting sections of a chart or adding arrows and text. Finally, the user is given the freedom to turn any part of a report into a visual element, whether that be a chart, graph, or table.
The user can even turn any aspect of a report into an automated video…but that will have to be the subject of a blog post sometime in the future 😊
In the olden days, if you wanted to send out a report to your employees, or an update to your clients, or a letter intended for prospective customers, you only had two options:
Option #1: Mass Communication
You create one piece of content and send this off to everybody. If it was a report, then everybody got the same report. If it was a letter to your clients, then every client got the same letter. This has the benefit of being cheap and fast, allowing you to reach as many people as possible. The drawback is that everybody gets the same content regardless of their particular circumstances.
Option #2: Custom Communications
Every client, potential customer, or employee gets content that is specifically written for them. This has the advantage of making the communication maximally effective for each end user. The cost of course, being that it takes time and money to create each communication. In many cases, it’s not feasible to send out a given number of narratives even with an outsized budget.
But what if you didn’t have to choose the least bad option? That is what’s possible with Mass Custom Communication (MCC), which takes the best parts of each option, allowing you to deliver insightful, impactful reports on a near infinite scale. But before we talk more about what MCC is, let’s talk about what it isn’t. MCC is not a form letter and it is not a ‘Mad Libs’ style narrative. We’ve all been the recipient of communications like that, and they barely register as being customized at all, let alone making us feel like they have been written just for us. This goes double if the narrative or report is part of an ongoing series of communications which all use the same template.
In order to create true MCC, you need two things: (1) a rich data set, and (2) Natural Language Generation (NLG) software that can truly synthesize that data set and turn it into a high-quality narrative. Let’s take these in order. First, you need a rich data set because you must have enough unique pieces of data, or combinations within that data, to write up something different for each end user. In essence, you need ‘too much’ information to fit into a template.
Once you have ‘too much’ information, you need high-quality synthesis from an AI NLG system. This system can go through thousands of data points to find the most relevant information for each end user. It then can automatically organize this information into a narrative with clear main points and interesting context, so that the end user is able to read something that feels unique and compelling to them. For example, a salesperson can get a weekly report that not only tells them about the top-line numbers for their sales this week, but also contextualizes those numbers with trends from their sales history and larger trends within the company.
Individual Outreach
A great example of the power of MCC comes from the world of fantasy sports. For those of you unfamiliar with how fantasy sports work, you and your friends each draft players within a given sport, and then your ‘team’ competes with other teams in the league. It’s sort of like picking a portfolio of stocks and seeing who can do the best.
People love playing fantasy sports, but because each team is unique, and because there are millions of fantasy players, people were never able to get stories about their fantasy league the same way that they get stories about the professional leagues. With the advent of MCC, suddenly they could, and CBS Sports decided to take advantage of it.
For the last 10 years, we’ve written up stories about what happened for every CBS fantasy team every week, creating game recaps, game previews, draft reports, power rankings, and many other content pieces, each with headlines, pictures, and other visual elements. This was actually infoSentience’s first product, so it is near and dear to my heart. We’ve now created over 200 million unique articles which give CBS players what we call a ‘front page’ experience, which covers their league using the same types of content (narratives, headlines, pictures) as the front page of a newspaper. Previously, they could only get a ‘back page’ experience that showed them columns of stats (yes, I realize I’m dating myself with this reference).
Critically, quality is key when it comes to making these stories work. Cookie-cutter templates are going to get real stale, real fast when readers see the same things over and over again each week. Personalization involves more than just filling in names and saying who won, it’s about finding the unique combinations of data and events that speaks to what made the game interesting. It’s writing about how you made a great move coaching AND how that made the difference in your game AND how that means you are now the top-rated coach in your league.
This type of insight is what makes for compelling reading. Open rates for the CBS weekly recap emails we send out are the highest of any emails CBS sends to their users. If you are sending out weekly reports to each of your department managers, or monthly updates for each of your clients, they have to be interesting or they’re not going to be read. If each report is surfacing the most critical information in a fresh, non-repetitive manner, then end users will feel compelled to read them.
Long Tail Reporting
Reports don’t necessarily have to be targeted at individuals in order to achieve scale. You might also just need to write a lot of reports using many subsets of your data. These reports might be targeted towards groups, or just posted onto a website for anybody to read. This might be better labeled Mass ‘Niche’ Communications, but it definitely falls under the MCC umbrella.
CBS has not only taken advantage of MCC for their fantasy product, but has also applied it to live sports. They leverage our automated reporting technology to supplement their newsroom by writing stories they otherwise wouldn’t have time for. Being one of the major sports sites in the US, they obviously have plenty of quality journalists. That still doesn’t mean that they are capable of covering every single NFL, NBA, college basketball, college football, and European soccer match, let alone writing up multiple articles for each game, covering different angles such as recapping the action, previewing the game, and covering the gambling lines.
That’s where our technology comes in. We provide a near limitless amount of sports coverage for CBS Sports at high quality. That last part is once again key. CBS Sports is not some fly-by-night website looking to capitalize on SEO terms by throwing up ‘Mad-Libs’ style, cookie-cutter articles. These write-ups need to have all the variety, insight, and depth that a human-written article would have, and that’s what we’ve delivered. For example, take a look at this college football game preview. If you just stumbled across that article you would have no idea it was written by a computer, and that’s the point.
Our work for IU Health is another example of this type of reporting. We automatically write and update bios for every doctor in their network, using information such as their education history, specializations, locations, languages, and many other attributes. We even synthesize and highlight positive patient reviews. Like with CBS, these bios would be too difficult to write up manually. There are thousands of doctors within IU’s network, and dozens come and go every month. By automating the bios, IU not only saved themselves a great deal of writing, but also made sure that all their bios are up to date with the latest information, such as accurate locations and patient ratings.
Your Turn?
If you are in a situation where you are either (A) not creating all the content you need because you don’t have the workforce, or (B) using generic reports/communications when you would really benefit from having a custom message, then hopefully this article opened your eyes to a new possibility. Mass Custom Communication allows you to have your cake and eat it too. Maybe your use case can be the one I talk about in the next version of this article.
[Note: All of the following concerns AI’s ability to write about specific data sets, something very different from ChatGPT-style natural language generators]
We all know the basics for why good writing is important in business. Decision makers want to read reports that are accurate, impactful, and easy to understand. With those qualities in mind, let’s take a look at this short paragraph:
Widgets were down this month, falling 2.5%. They were up the last week of the month, rising 4.5%.
Assuming the numbers are correct, the sentences would certainly qualify as being ‘accurate’. The metrics mentioned also seem like they would be ‘impactful’ to a widget maker. Is it ‘easy to understand’? Here’s where it gets a bit trickier. Both sentences by themselves read fine, but when you put them together, there’s something missing. The writing comes across as robotic. Ideally, we’d want the sentences to read more like this:
Widgets were down this month, falling 2.5%. The last week of the month was a bright spot though, as sales rose 4.5%.
This is conveying the exact same basic information as before: (1) monthly sales down, and (2) last week up. But the sentence contains a critical new word: though. Transition words like ‘though’, ‘however’, ‘but’, and so on play a critical role in helping our brains not just download data, but rather tell the story of what is happening in the data. In this case, the story is about how there’s a positive sign in the time series data, which is also emphasized by using the ‘bright spot’ language.
If a manager was just reading the first example paragraph, it’s likely that they would be able to fill in the missing info. After reading the second sentence their brain would take a second and say “oh, that’s a good sign going forward in the midst of overall negative news.” But having to make the reader write the story in their head is not cost free.
After all, it's not enough to just understand a report. Whether it’s an employee, C-suite executive, client, or anybody else, the point of a report is to give someone the information they need to make decisions. Those decisions are going to require plenty of thinking on their own, and ideally that is what you are spending your brainpower on when reading a report.
This brings me to my alternate definition of what ‘good writing’ really means: it’s when you are able to devote your brainpower to the implications of the report rather than spending it on understanding the report.
So what are the things that we need to allow our brain to relax when it comes to understanding data analysis? There are three main factors, which I call “The Three T’s”, and they are:
All of these things are hard enough for a human data analyst or writer to pull off (am I extra nervous about my writing quality for this post? Yes, yes I am). For most Natural Language Generation (NLG) AI systems, nailing all Three T’s is downright impossible. Each factor presents its own unique challenges, so let’s see why they are so tough.
Trusting AI
The bare minimum for establishing trust is to be accurate, and that’s one thing that computers can do very easily. Better, in fact, than human analysts. The other key to establishing trust is making sure every bit of critical information will make it into the report, and this is where AI’s have traditionally struggled. This is because most NLGs in the past used some sort of template to create their narratives. This template might have a bit of flexibility to it, so a paragraph might look like [1 – sales up/down for month] [2 – compare to last year (better or worse?)] [3 – estimated sales for next month]. Still, templates like that aren’t nearly flexible enough to tell the full story. Sometimes the key piece of information is going to be that there was a certain sub-component (like a department or region) that was driving the decrease. Other times the key context is going to be about how the movement in the subject of the report was mirrored by larger outsize groups (like the market or economic factors). Or, the key context could be about the significance of the movement of a key metric, such as whether it has now trended down for several months, or that it moved up more this month than it has in three years.
There’s no way to create a magic template that will somehow always include the most pertinent information. The way to overcome this problem is to allow the AI system to work the same way a good human analyst does- by first analyzing every possible event within the data and then writing up a report that includes all of those events. But having this big list of disjointed events makes it that much more difficult to execute on the second ‘T’: Themes.
Organizing the Story
How does the AI go from a list of interesting events to a report that has a true narrative through line, with main points followed by interesting context? The only way to do this is by embedding conceptual understanding into the AI NLG system. The system has to be able to understand how multiple stories can come together to create a theme. It has to be able to understand which stories make sense as a main point and which stories are only interesting as context to those main points. It also needs to be able to have the ability to fit those stories into a narrative of a given length, thereby requiring some stories to be told at a high level rather than going into all of the details.
Writing the Story
Just like how the importance of transitions are a small-scale version of themes, the challenges for dealing with transitions are a small-scale version of themes. That doesn’t make them any easier, unfortunately. The issue with themes is structural- how do I organize all the key pieces of information? The problem with transitions comes about after everything has been organized and the AI has to figure out how to write everything up.
Again, the solution is to have an AI system that understands conceptually what it is writing about. If it understands that one sentence is ‘good’ and the next sentence is ‘bad’, then it is on the path to being able to include the transitions needed to make for good writing. Of course, it’s not quite that simple, as there are many conceptual interactions taking place within every sentence. For example, starting out a sentence with “However, …” would read as robotic if happening twice within the span of a few sentences. Therefore, the AI system needs to find a way to take into account all of the conceptual interactions affecting a given sentence and still write it up properly.
Have no Fear!
These are all complex challenges, but thankfully infoSentience has built techniques that can handle The Three T’s with no problems. Whether it’s a sales report, stock report, sports report, or more, infoSentience can write it up at the same quality level as best human writers. Of course, we can also do it within seconds and at near infinite scale (Bill Murray voice- “so we got that going for us”).