OpenAI will let publishers opt out of AI training
OpenAI announced last week it’s developing a tool called Media Manager, which will enable publishers and other content owners to flag copyrighted material to the company and to specify whether or not they want their work to be used to help train its models.
“We believe AI systems should benefit and respect the choices of creators and content owners,” a post announcing Media Manager read, adding, “We’re continually improving our industry-leading systems to reflect content owner preferences, and are dedicated to building products and business models to fuel vibrant ecosystems for creators and publishers.”
Last year OpenAI said publishers could use robots.txt files on their websites to express preferences about the use of their content to train AI. The company says it “takes these signals into account” when training new models, (although it hasn’t explicitly stated it honors those requests.)
However, the robots.txt opt-out is an incomplete solution, according to OpenAI. “Many creators do not control websites where their content may appear, and content is often quoted, reviewed, remixed, reposted and used as inspiration across multiple domains. We need an efficient, scalable solution for content owners to express their preferences about the use of their content in AI systems,” it said.
To remedy that, OpenAI says Media Manager will use machine learning research to build a first-ever tool of its kind to help identify copyrighted text, images, audio, and video across multiple sources and reflect creator preferences. It’s collaborating with creators, content owners, and regulators to develop the tool and its goal is to have it in place by 2025, it said.
OpenAI is stepping up its efforts to be seen as a friend and partner to publishers and the broader media ecosystem. It’s already struck content licensing deals with a handful of major publishers, but it remains unclear how much leverage publishers realistically have in licensing negotiations, and whether they’re essentially taking whatever they can get from OpenAI for fear of “missing the boat” and being shut off from its users altogether.
“We’re building products to benefit users, creators and publishers in a vibrant ecosystem,” OpenAI said. “We recently improved source links in ChatGPT to give users better context and web publishers new ways to connect with our audiences. We’re also working with partners to display their content in our products and increase their connection to readers”
The company also positioned its AI models as “learning machines” rather than databases, to stress that its systems do not simply “regurgitate” existing content, but instead stitch existing information together in new ways. That information comes largely from crawling websites, it said, but it excludes sources it knows have paywalls or have opted out.