[Note: All of the following concerns AI’s ability to write about specific data sets, something very different from ChatGPT-style natural language generators]
We all know the basics for why good writing is important in business. Decision makers want to read reports that are accurate, impactful, and easy to understand. With those qualities in mind, let’s take a look at this short paragraph:
Widgets were down this month, falling 2.5%. They were up the last week of the month, rising 4.5%.
Assuming the numbers are correct, the sentences would certainly qualify as being ‘accurate’. The metrics mentioned also seem like they would be ‘impactful’ to a widget maker. Is it ‘easy to understand’? Here’s where it gets a bit trickier. Both sentences by themselves read fine, but when you put them together, there’s something missing. The writing comes across as robotic. Ideally, we’d want the sentences to read more like this:
Widgets were down this month, falling 2.5%. The last week of the month was a bright spot though, as sales rose 4.5%.
This is conveying the exact same basic information as before: (1) monthly sales down, and (2) last week up. But the sentence contains a critical new word: though. Transition words like ‘though’, ‘however’, ‘but’, and so on play a critical role in helping our brains not just download data, but rather tell the story of what is happening in the data. In this case, the story is about how there’s a positive sign in the time series data, which is also emphasized by using the ‘bright spot’ language.
If a manager was just reading the first example paragraph, it’s likely that they would be able to fill in the missing info. After reading the second sentence their brain would take a second and say “oh, that’s a good sign going forward in the midst of overall negative news.” But having to make the reader write the story in their head is not cost free.
After all, it's not enough to just understand a report. Whether it’s an employee, C-suite executive, client, or anybody else, the point of a report is to give someone the information they need to make decisions. Those decisions are going to require plenty of thinking on their own, and ideally that is what you are spending your brainpower on when reading a report.
This brings me to my alternate definition of what ‘good writing’ really means: it’s when you are able to devote your brainpower to the implications of the report rather than spending it on understanding the report.
So what are the things that we need to allow our brain to relax when it comes to understanding data analysis? There are three main factors, which I call “The Three T’s”, and they are:
All of these things are hard enough for a human data analyst or writer to pull off (am I extra nervous about my writing quality for this post? Yes, yes I am). For most Natural Language Generation (NLG) AI systems, nailing all Three T’s is downright impossible. Each factor presents its own unique challenges, so let’s see why they are so tough.
Trusting AI
The bare minimum for establishing trust is to be accurate, and that’s one thing that computers can do very easily. Better, in fact, than human analysts. The other key to establishing trust is making sure every bit of critical information will make it into the report, and this is where AI’s have traditionally struggled. This is because most NLGs in the past used some sort of template to create their narratives. This template might have a bit of flexibility to it, so a paragraph might look like [1 – sales up/down for month] [2 – compare to last year (better or worse?)] [3 – estimated sales for next month]. Still, templates like that aren’t nearly flexible enough to tell the full story. Sometimes the key piece of information is going to be that there was a certain sub-component (like a department or region) that was driving the decrease. Other times the key context is going to be about how the movement in the subject of the report was mirrored by larger outsize groups (like the market or economic factors). Or, the key context could be about the significance of the movement of a key metric, such as whether it has now trended down for several months, or that it moved up more this month than it has in three years.
There’s no way to create a magic template that will somehow always include the most pertinent information. The way to overcome this problem is to allow the AI system to work the same way a good human analyst does- by first analyzing every possible event within the data and then writing up a report that includes all of those events. But having this big list of disjointed events makes it that much more difficult to execute on the second ‘T’: Themes.
Organizing the Story
How does the AI go from a list of interesting events to a report that has a true narrative through line, with main points followed by interesting context? The only way to do this is by embedding conceptual understanding into the AI NLG system. The system has to be able to understand how multiple stories can come together to create a theme. It has to be able to understand which stories make sense as a main point and which stories are only interesting as context to those main points. It also needs to be able to have the ability to fit those stories into a narrative of a given length, thereby requiring some stories to be told at a high level rather than going into all of the details.
Writing the Story
Just like how the importance of transitions are a small-scale version of themes, the challenges for dealing with transitions are a small-scale version of themes. That doesn’t make them any easier, unfortunately. The issue with themes is structural- how do I organize all the key pieces of information? The problem with transitions comes about after everything has been organized and the AI has to figure out how to write everything up.
Again, the solution is to have an AI system that understands conceptually what it is writing about. If it understands that one sentence is ‘good’ and the next sentence is ‘bad’, then it is on the path to being able to include the transitions needed to make for good writing. Of course, it’s not quite that simple, as there are many conceptual interactions taking place within every sentence. For example, starting out a sentence with “However, …” would read as robotic if happening twice within the span of a few sentences. Therefore, the AI system needs to find a way to take into account all of the conceptual interactions affecting a given sentence and still write it up properly.
Have no Fear!
These are all complex challenges, but thankfully infoSentience has built techniques that can handle The Three T’s with no problems. Whether it’s a sales report, stock report, sports report, or more, infoSentience can write it up at the same quality level as the best human writers. Of course, we can also do it within seconds and at near infinite scale (Bill Murray voice- “so we got that going for us”).