Key questions around OpenAI’s licensing deals with publishers

By Jack Marshall

OpenAI now has content licensing agreements in place with several major global publishers including News Corp, The Atlantic, Vox Media, Axel Springer, Financial Times, Dotdash Meredith, Le Monde, and Associated Press.

Publishers have positioned those deals as wins for their businesses and evidence the generative AI firm fairly values access to their content. But for publishers mulling the possibility of their own AI licensing arrangements, several key questions are now top of mind:

  1. Is “playing ball” with OpenAI in publishers’ best interests long-term?
  2. How much are publishers being paid, and how is their content being valued?
  3. Do publishers have any meaningful leverage in negotiations with OpenAI, or are they essentially taking whatever they can get?
  4. What types of publishers might benefit most from licensing deals, and which have the most to lose without them?
  5. How many licensing deals will OpenAI strike, and do publishers risk “missing the boat” if they fail to strike agreements soon?
  6. Are deals exclusive, and could a bidding war drive up prices for access to publishers’ content?

Terms of OpenAI’s publisher deals

Specific details of OpenAI’s deals with publishers remain unclear, but arrangements so far cover:

  1. Training data: Deals will allow OpenAI to use participating publishers’ content to train its large language models.
  2. Greater visibility for publishers: Content from participating publishers will receive priority placement and “richer brand expression” in ChatGPT responses via three different linking mechanisms, including an “anchor” treatment that will include branded and clickable buttons to publishers’ sites below responses and an “inline” treatment that will insert pull quotes from publishers’ content directly within responses accompanied by links to publishers’ sites.
  3. Financial terms: OpenAI has pitched some publishers on a model where they’re paid a fixed fee for allowing OpenAI to access their content archives for training purposes, while separate performance-based payments would be made depending on how much users engage with publishers’ content in ChatGPT responses. Total payouts would combine guaranteed and variable values to arrive at annual payouts to publishers. It’s unclear if existing deals are based on these terms.
  4. Ongoing collaboration: Announcements so far have all included references to ongoing collaboration that will supposedly see OpenAI collaborating with publishers to create new products and services oriented around their content.

Is “playing ball” with OpenAI in publishers’ best interests?

Not everyone is convinced that licensing arrangements with OpenAI are in publishers’ best interests in the long run. Technology companies have for decades used content to help grow their audiences and engagement on their platforms, and many would argue they’ve done so without adequately compensating content creators and rights holders.

“Media companies are making a huge mistake by licensing to AI”, The Information’s founder Jessica Lessin argued in a contributed piece for The Atlantic last week. “Publishers should be patient and refrain from licensing away their content for relative pennies… It’s simply too early to get into bed with the companies that trained their models on professional content without permission and have no compelling case for how they will help build the news business,” she wrote.

Given that OpenAI and other generative AI platforms have already used publishers’ content to help train their large language models without permission, licensing deals are effectively designed to “absolve them of the theft” and “amount to settling without litigation,” she added. 

It’s easy to see Lessin’s point. But while shunning AI might be in some publishers’ best interests long-term, it may not make sense for all. “I think the analysis is different for the big generalists,” Business Insider founder Henry Blodget wrote on X. “The $250mm News Corp just got from OpenAI is extremely meaningful. And if in 5 years the deal sucks for News Corp, they can cancel. If News Corp just said “no,” meanwhile, the future might happen without them. This way, they get paid (a lot) to experiment. Also, OpenAI will be fine with or without News Corp (and, frankly, any other news orgs)… So I think News Corp is smart to take the money and play ball for a while.”

Financial Times chief executive John Ridding has expressed a similar view, suggesting that “pulling up the drawbridge or trying to hold back the tide is not going to be a strategy for success” for publishers. Investing in high-quality journalism and content “means little” if publishers are “cut out of the loop between readers and information, and therefore revenues,” he said onstage at the WAN-IFRA World News Media Congress last week.

How much are publishers being paid?

Financial details of OpenAI’s existing licensing agreements have not been disclosed by the company or its publisher partners, but some details are beginning to emerge.

Its multi-year licensing agreement for access to News Corp’s content could be worth up to $250 million in cash and credits for the use of OpenAI over five years, according to WSJ. The Information reported that some news organizations have been pitched by OpenAI with offers starting at  $1 million, although it’s unclear what those proposed deals might have entailed. Elsewhere, Google’s licensing deal with Reddit is worth around $60 million a year, according to the Associated Press, although the nature and scale of Reddit’s content is very different to that of most publishers.

Platforms are typically careful to protect the details of their payments to publishers, often disguising them as part of broad agreements that include vague references to partnering around technology or data. Publishers are usually subject to non-disclosure agreements, too.

How much publishers are being paid is an entirely different question to how much their content will be worth to AI companies over the coming years, of course. There’s currently no good way to estimate that. One publisher will value their content differently to the next for various reasons, and with the landscape evolving so quickly and dramatically it’s essentially impossible to determine what a “fair” price is likely to be anyway.

In the long term, OpenAI may devise mechanisms to license content and compensate publishers in a more automated and scalable fashion based on how frequently it’s used, surfaced, or otherwise utilized by the company’s systems. That’s far from a given at this point, though, and some publishers suggest that type of model would likely work to OpenAI’s advantage anyway.

Do publishers have any meaningful leverage in negotiations with OpenAI?

With few details about OpenAI’s publisher agreements currently known, it remains unclear if publishers are driving hard bargains or if they’re essentially taking whatever they’re offered by OpenAI for fear of being cut off from its user base entirely. 

The rise of generative AI has many publishers worried that platforms such as ChatGPT and Google’s search engine will soon be sending significantly less traffic to their properties, and some publishers believe they have even less leverage than they have historically with tech companies given much of their content has already been ingested and understood – legally or otherwise – by their large language models. Technology companies often find it easier to ask for forgiveness than permission when disrupting industries, and it appears the same is true when it comes to AI training data. 

Publishers with strong brands and those regularly publishing new and unique content may find themselves best positioned to negotiate favorable terms, but others may feel they have little choice but to take what they can get. “Without an agreement, they will use our content in a more or less rigorous and more or less clandestine manner without any benefit for us,” Le Monde CEO Louis Dreyfus told WSJ.

Instead of entering into partnerships, some publishers are opting to file lawsuits against OpenAI. But in practice, those suits are likely intended to help publishers claw back some leverage in licensing negotiations rather than force the AI training toothpaste back in the tube.

What types of publishers might benefit most from licensing deals?

The importance of licensing deals for publishers’ businesses could vary significantly from one to the next. Whether licensing deals are “good” or “bad” for a specific publisher’s business will depend largely on the nature of its brand, content, audience, and business model (in addition to compensation terms, of course.)

For publishers with strong brands and established paying relationships with audiences, it seems plausible AI licensing deals might not cannibalize their existing businesses and could ultimately help them generate incremental revenue by exposing their content to new audiences.

As more premium publishers strike content licensing deals with AI companies, this could give rise to a model where casual readership is monetized primarily on third-party platforms, and publishers focus their efforts instead on monetizing their most engaged audiences via subscriptions, events, and other products sold directly to audiences. In that scenario, publishers with healthy licensing arrangements and robust subscription businesses may feel they’re relatively well-positioned to capitalize as AI platforms continue to gain traction. 

Publishers that rely heavily on referral traffic from third-party platforms and/or advertising revenue to support their businesses will likely be looking at the situation more negatively, however, and focusing instead on what shape their businesses could be left in if they don’t play ball with AI firms. Some publishers may feel they have significantly more to lose than others, particularly if their business models risk being fundamentally and irreparably broken by generative AI.

How many licensing deals will OpenAI strike?

OpenAI’s current deals grant it access to new and archived content from over 60 individual publications, but it’s unclear how many deals it’s likely to strike in total. 

It stands to reason that OpenAI will want access to a handful of major news sources to enable it to deliver timely results to queries linked to current events. But for more evergreen and/or commoditized information it may feel less need to update its training data as frequently or to pay publishers for access to it at all. 

And even if OpenAI does devise mechanisms to license content and compensate publishers at scale, it’s hard to imagine it compensating hundreds of thousands of publishers the way platforms such as Google have via advertising in recent decades.

There’s a palpable sense among some publishers that they risk “missing the boat” and being left out in the cold by OpenAI if they don’t manage to reach licensing agreements early. But others suggest the company’s existing deals could have more to do with staving off regulation and/or litigation than gaining access to content anyway. If that’s the case, its budget for deals might be dictated primarily by its legal teams rather than anything to do with its core product or desire to support sustainable journalism.

Are deals exclusive, and could a bidding war drive up prices?

It remains unclear if OpenAI’s existing deals are in any way exclusive, or if publishers have retained the rights to strike similar agreements with Google and other AI providers.

If OpenAI’s deals are non-exclusive, that might enable publishers to seek compensation from a range of AI platforms and could leave platforms without access to some key licenses at a disadvantage. Similarly, publishers willing to agree to exclusive deals could be in a position to drive up prices by pitting licensors against each other. The likelihood of that scenario depends largely on what Google does next in terms of its plans to compensate (or at least appease) publishers.

Some publishers hope the rise of AI will result in a dynamic marketplace in which publishers can effectively auction off licensing rights to the highest bidder. Others are less optimistic and believe that the hugely complicated nature of large language models will make policing the use of copyrighted content essentially impossible anyway.