Sunday, March 17, 2024

How to Block Google Bard and Perplexity AI with robot.txt


In today's digital landscape, the presence of AI-powered bots can pose unique challenges for website owners concerned about content security and control. Fortunately, the robots.txt file is valuable for regulating bot access and safeguarding your online assets. Here's how you can use it to block several AI generative APIs:

1. Blocking Google AI (Bard and Vertex AI Generative APIs)

To prevent Google AI, including Bard and Vertex AI generative APIs, from accessing your website, add the following directives to your robots.txt file:

User-agent: Google-Extended

Disallow: /

Google's extensive AI capabilities can be harnessed for various purposes, but controlling their access to your content is paramount for maintaining privacy and security.


2. Blocking CommonCrawl (CCBot)

CommonCrawl, while a non-profit organization, deploys its bot, CCBot, to gather data for AI training purposes. To block CommonCrawl's access, include the following lines in your robots.txt file:

User-agent: CCBot

Disallow: /

While CommonCrawl's intentions may be benign, preventing their bot from indexing your content ensures that your data remains within your control.

3. Blocking Perplexity AI

Perplexity AI offers a service that rewrites content using generative AI, potentially altering the original intent or context. To block Perplexity AI, utilize the following directives:

User-agent: PerplexityBot

Disallow: /

Perplexity AI has published IP address ranges that can be blocked using your web application firewall (WAF) or server firewall, providing an extra layer of protection against unauthorized access.

By implementing these measures in your robots.txt file, you can effectively manage and mitigate the risks associated with AI-powered bots accessing your website's content. Remember, proactive measures are essential for maintaining control over your online presence and protecting your intellectual property.

0 comments:

Post a Comment