How to Keep Your Site's Content From Being Used to Train AI

If you don’t really want your content to be used to train our future robot overlords, here are the steps that will stop ethical AI companies from scraping your site.

We thought we’d pass along this article published today by the folks at Electronic Frontier Foundation.

Here’s a little sample to let you know what it’s about:

** If our coverage matters to you, please consider supporting our work through our FOSS Force Independence 2026 fundraiser. **

“We’ve long been supporters of the right to scrape websites—the process of using a computer to load and read pages of a website for later analysis—as a tool for research, journalism, and archivers. We believe this practice is still lawful when collecting training data for generative AI, but the question of whether something should be illegal is different from whether it may be considered rude, gauche, or unpleasant. As norms continue to develop around what kinds of scraping and what uses of scraped data are considered acceptable, it is useful to have a tool for website operators to automatically signal their preference to crawlers. Asking OpenAI and Google (and anyone else who chooses to honor the preference) to not include scrapes of your site in its models is an easy process as long as you can access your site’s file structure.”

You can read the entire article here: No Robots(.txt): How to Ask ChatGPT and Google Bard to Not Use Your Website for Training

AlmaLinux Day: Los Angeles

ATO Meetup – The State of Tech & AI Jobs

All Things Open 2026

Open Source Monitoring Conference

How to Keep Your Site’s Content From Being Used to Train AI

If you don’t really want your content to be used to train our future robot overlords, here are the steps that will stop ethical AI companies from scraping your site.