In the age of generative AI, publishers and content creators face an unprecedented challenge: your work can be scraped, indexed, and used to train large language models (LLMs) without your consent or compensation. This guide offers a quick overview of your rights, your options, and your tools to take back control.
1. Understanding the Risk
AI models are often trained on publicly accessible content scraped from the web. This includes:
- Articles, blogs, and editorials
- Reviews, recipes, and guides
- Metadata (headings, categories, tags)
- Images and alt-text
The impact? Your words may be reproduced without context, attribution, or traffic being returned to your site.
2. What Does the Law Say?
EU Directive on Copyright (DSM Directive)
- Article 4 allows text and data mining (TDM) for research purposes.
- But: You can opt out if you state this clearly in your Terms of Use or
robots.txt.
Key Legal Concepts:
- Moral Rights: Your right to attribution and integrity of your work.
- Economic Rights: Control over reproduction and distribution.
- Database Protection: In the EU, structured collections of articles may be protected.
3. How to Opt Out of AI Crawling
A. Update Your robots.txt
User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: /
B. Add a Legal Disclaimer
Include this in your footer or legal notice:
“Content on this site may not be used for the development or training of AI systems, machine learning models, or any automated systems without explicit permission.”
4. Tools and Actions You Can Take
- Monitor your server logs for unusual crawler behavior.
- Use services like Cloudflare or server firewalls to block abusive bots.
- Join industry groups (like EATW!) to push for collective enforcement.
- Report misuse to data protection authorities or copyright bodies.
5. What About Search Engines?
Crawlers like Googlebot are still essential for visibility. Blocking everything is not recommended. Use selective rules instead:
User-agent: Googlebot Disallow: /private-directory/ Allow: /
Use robots.txt and sitemaps strategically to allow good bots and block exploitative ones.
6. Moving Forward as a Community
As independent publishers, we have strength in numbers. Share resources, support fair licensing, and educate your peers. The web may be open, but your content isn’t up for grabs.
Contact
European Association of Travel Writers (EATW)
www.eatw.org
For questions or to join our mailing list: