Using AI To Help Turn Journalism Into Structured Data

By Jacob Cohen Donnelly
stock.adobe.com

One of the biggest crimes in the journalism business is that we collect so much information from interviewing sources and then two things happen. First, we deliver a cut of that information in an article format, which is inherently limiting. And second, much of the remaining information winds up on the editor’s floor.

And for as long as journalism has been a thing, this has been the case. With the arrival of generative AI, however, there are now opportunities to extract so much more value from all of that information that could turn into new products or enhance the current ones we offer. The key reason for this is because, for the first time ever, it’s cost effective to take unbelievably unstructured information—articles and interview transcripts—and turn it into structured data.

Let’s pause for a second and define those two terms.

  • Structured Data: This is data that has a clear taxonomy. Imagine a spreadsheet with clearly labeled columns and then you fill in the rows accordingly. It’s straightforward to scan and can be actionable.
  • Unstructured Data: This is data that has no taxonomy. There are a ton of facts and information in it, but it hasn’t been organized in a way that can be scanned or made actionable.

Think about a journalist sitting down for an interview. They ask a number of questions and then get answers that might be long-winded and go on tangents that have no bearing on the core story they’re working on. Perhaps a good journalist will look back on that and identify other threads they can pull, but unfortunately, much of it is just going to go in the garbage.

But what if, instead, you took the entire transcript and put it into your own AI agent and had it analyze it from end-to-end? This could be done at the end of every interview where it’s told to look for specific types of information. The outcome would be taking an unstructured transcript and turning it into something structured.

This idea was inspired by a conversation I had with a financial media operator recently. He said the thing that many of their readers care about is what big asset allocators (folks who invest on behalf of other people) are going to buy. So, if you’re Blackstone (major private equity firm), you care what the head of investing at The California Public Employees’ Retirement System (CalPERS) is thinking about investing in.

Perhaps when the reporter interviewed the head of investing at CalPERS that was a direct question. But maybe it wasn’t. Maybe the reporter was thinking about CalPERS’ interest in bonds and the head of investing said, “We’re actually considering pulling back from private equity a bit.” It’s one sentence that may never wind up in the story, but that’s a data point. CalPERS could be put into a column on a spreadsheet as someone who is thinking about reducing their PE investing.

You could then use that same AI agent to create charts overtime that show how CalPERS has changed its investment thesis or the top ten biggest pension funds that are all pulling back on PE investment. The potential is quite considerable there if you start to peel back the layers.

Then there’s The Information. It is well known for doing its org charts that show how various folks report into the CEOs of some of the biggest technology firms. While gathering that information is likely easy, it still requires the reporter to remember that they learned who reports to whom and update a corresponding spreadsheet. With AI, that can be done automatically, allowing the reporter to ideally create a much more robust org chart.

AMO has the potential for this too. We have a private equity database, where we attempt to track the deals that occur. But we don’t always remember to update it and we might hear about a deal in an interview that doesn’t make its way into a story. With AI, we could take our interviews and immediately extract that a deal was recently done. It’s better for the user and it’s easier for us.

And for every niche that exists, there is unstructured data that can be found in the process of interviewing sources that could be turned into structured data. But why do we care about trying to make this unstructured information more organized?

A big reason is that it unlocks the potential to create a high-priced subscription product. Think about the difference between subscriptions for a media company and for an information services company. The former is primarily selling access to articles as they happen. The latter is selling access to data. The reality is, consumers and investors tend to value data businesses much higher than traditional media companies.

I’m not suggesting that media companies give up their core models. On the contrary, the very muscle of being a journalism-driven company makes this sort of product possible. Only a good reporter is trained and capable to get great information out of sources. It’s just the mode in which we deliver that information to the reader—the article—that limits us. AI changes that.

You might start to augment some of the questions that reporters ask. So, if org charts are a standard product of yours, maybe your reporter is tasked with asking every source, “Who do you report to?” Or “Who reports to you?” It adds 30 seconds to the interview, but suddenly, you’ve got a lot of incremental data that can be organized.

The outcome could be a multi-revenue stream business. Perhaps some information is free and monetized with advertising. Then there’s a higher priced subscription for all articles. And then there’s an uber-premium subscription that gives access to all the non-article interview structured data. One interview turns into multiple outputs. That’s how you get more bang for your buck.