Featured image
AI Training Data

Google Introduces Google-Extended: A New Feature for Publishers to Opt Out of AI Training Data

avatar

Sven

October 2nd, 2023

~ 2 min read

Google has unveiled a new tool called Google-Extended that allows website publishers to opt out of having their data used to train the company's AI models. This means that publishers can still have their sites scraped and indexed by crawlers like the Googlebot without contributing to the development of AI models. This development comes as Google continues to explore ways to provide choice and control to web publishers in the era of expanding AI applications.

Google-Extended, available through robots.txt, gives publishers the ability to manage whether their sites contribute to the improvement of AI generative APIs like Bard and Vertex AI. With the toggle provided by Google-Extended, publishers can now have control over the access to content on their sites. This not only provides autonomy to publishers but also ensures that publicly available data scraped from the web is not used to train AI chatbots like Bard.

Many websites have already taken steps to block web crawlers used by OpenAI to scrape data and train ChatGPT. However, the challenge of blocking Google remains, as completely closing off Google's crawlers would result in their sites not being indexed in search. To address this concern, some sites, such as The New York Times, have legally blocked Google by updating their terms of service to prohibit companies from using their content to train AI.

As AI applications continue to expand, Google is committed to exploring additional machine-readable approaches to provide choice and control to web publishers. The introduction of Google-Extended is just the beginning, as Google plans to offer more options in the future. This move highlights Google's dedication to transparency and empowering publishers in the AI landscape.

Google's introduction of Google-Extended marks a significant milestone in the world of AI and web publishing. By giving publishers the ability to opt out of contributing their data to train AI models, Google is putting choice and control back into the hands of publishers. This development not only addresses concerns raised by websites but also demonstrates Google's commitment to responsible AI practices.

Links: Google Announcement