The Hidden Cost of AI Innovation: Is Your Website Paying the Price?

When Artificial Intelligence Becomes a Real Problem for Website Owners

By Ryan Thrash  |  March 28, 2025  |  9 min read
The Hidden Cost of AI Innovation: Is Your Website Paying the Price?

In today’s digital landscape, a silent battle is raging that threatens the very foundation of your online presence. While you’re focused on attracting human visitors, your website might be drowning under a flood of automated AI crawlers that scrape content with unprecedented intensity.

A small boat with a server rack in it about to be capsized by massive AI robot wave and an errupting volcano in the background, symbolizing website struggles with AI and mass chaos

The Problem Is Real (And Getting Worse)

AI is beyond revolutionary, inspiring, and frankly mind boggling in the grandest sense of the word.

But the rise of generative AI has unleashed a new generation of web crawlers that don’t play by the old rules.

These aren’t your polite auntie’s search engine bots that dutifully follow instructions and only take what they need.

Today’s AI crawlers are voracious, relentless, insatiable, resource-hungry machines. They can bring even well-architected websites to their knees.

Plugin-bloated websites don’t stand a chance.

These new scrapers inhale everything they can to train their AI overlords’ models—along with every startup that wants to join them. They can, and probably will, destroy your site’s performance, effectively taking it offline.

This leaves your site visitors frustrated. Not exactly ideal if you rely on your site for leads or sales.

Consider These Alarming Facts

Important note: This isn’t just a problem with obscure AI startups or fringe players. At MODX, we’ve seen firsthand that even the most mainstream, established AI providers—including Anthropic (Claude), Meta (Facebook), Amazon, and Baidu—can cause significant server load issues. The industry leaders may have better crawling practices than smaller players, but their massive scale means they can still overwhelm websites with requests.

Why Your Website Is Vulnerable

If you’re thinking, ”This won’t affect my site,” think again. These bots aren’t just targeting large tech platforms. They're indiscriminately crawling the entire web, including:

  • Small business websites
  • Digital agency client sites
  • Blogs and content hubs
  • E-commerce storefronts
  • Service-based business pages

Particularly vulnerable are content-rich sites with frequently updated information and paginated navigation structures. This includes:

  • Large blog platforms with archive pages
  • SEO-optimized sites with numerous keyword-focused articles
  • Government (.gov) information portals
  • Educational institution (.edu) websites
  • Media sites with extensive article collections

These sites create a “perfect target” for AI crawlers due to their structured content organization, depth of information, and frequently updated material.

The real danger lies in how these crawlers operate. They don’t just visit your homepage—they request thousands of pages, often including non-existent URLs their AI systems have hallucinated. This creates a perfect storm of server strain that can:

  1. Slow down your website for real human visitors
  2. Increase your hosting costs dramatically
  3. Skew your analytics data (making your marketing decisions less effective)
  4. Potentially crash your site during peak business hours

Even mainstream AI companies with better ethics policies can cause these problems. A recent case study mentioned above found that Anthropic’s Claude crawler hit one site about a million times in 24 hours, despite the company’s stated commitment to responsible crawling. When the most responsible actors in the space can cause this level of disruption, imagine what the less scrupulous ones are doing.

The Path Forward: How to Protect Your Digital Investment

As a website owner or digital agency, you need practical solutions that don’t require a computer science degree. Here’s your action plan:

1. Recognize the Signs

Your website may be under bot pressure if: - Server loads have inexplicably increased - Pages load more slowly than usual or you experience brief outages that previously didn’t occur - Hosting costs are rising without corresponding business growth - Analytics show unusual traffic patterns with high bounce rates

2. Implement Basic Protections

Even without technical expertise, you can take these steps:

  • Update your robots.txt file to specifically exclude AI crawlers like GPTBot and Claude (though be aware many bots simply ignore these directives)
  • Implement rate limiting through your hosting provider or a service like Cloudflare to restrict the number of requests from specific IP ranges, user agents, ASNs and other origins
  • Consider a Web Application Firewall (WAF) that can identify and block problematic traffic
  • Set up geographical filtering with managed challenges for all traffic originating outside your target market—if your business only serves North America, there’s little reason for servers in Asia to access your site thousands of times per hour
  • Review your analytics regularly to spot unusual patterns
  • Explore bot management tools like Cloudflare’s AI Labyrinth that use clever countermeasures instead of simple blocking. Fair warning: this one is early in the game and may cause issues with your site…

3. Work With Experts When Needed

Some situations require professional intervention:

  • Persistent performance issues despite basic measures
  • Dramatic hosting cost increases
  • Complete site outages during bot attacks

Advanced Defense: Beyond Basic Blocking

Traditional blocking methods are increasingly ineffective against sophisticated AI crawlers. These bots can:

  • Spoof user-agent strings to hide their identity
  • Use residential IP addresses as proxies to evade IP-based blocks
  • Cycle through multiple IP addresses to avoid rate limits
  • Completely ignore robots.txt directives

This is why innovative approaches like Cloudflare’s beta AI Labyrinth are gaining attention. Instead of simply blocking bots (which alerts them they’ve been detected), this tool feeds them an endless maze of AI-generated content at the edge, without hitting the origin servers, that:

  1. Wastes their computational resources
  2. Distracts them from your actual content
  3. Creates no value for their training data
  4. Identifies their behavioral patterns for better detection

Other effective strategies include:

  • Progressive rate limiting: Start with generous limits and gradually make them more restrictive for suspicious traffic patterns
  • Geographic filtering with challenges: Implement CAPTCHA or JavaScript challenges for visitors from countries outside your target market
  • Intelligent CDN configurations: Set up content delivery rules that serve different content to suspected bots versus human visitors
  • Advanced logging and monitoring: Deploy tools that can identify patterns in crawler behavior to better distinguish between legitimate and problematic traffic

These approaches essentially turn generative AI against itself, using it as a defensive weapon that helps protect legitimate websites while ensuring real human visitors experience no disruption.

The Desperate Measures: How Bad Has It Gotten?

The situation has become so dire that website owners are taking increasingly drastic actions that would have seemed unthinkable just a few years ago:

  • Blocking entire countries: Kevin Fenzi, a member of the Fedora Pagure project’s sysadmin team, reported having to block all traffic from Brazil after repeated attempts to mitigate bot traffic failed
  • Creating hostile tarpits: Some developers have deployed Nepenthes and similar tools that deliberately waste bot resources by trapping them in infinite generated content mazes
  • Building computational barriers: Xe Iaso, a developer overwhelmed by Amazon’s bots, created “Anubis,” a system requiring visitors to solve computational puzzles before accessing content—even though this sometimes means mobile visitors wait up to two minutes for access
  • Abandoning platforms entirely: Some smaller sites have moved behind VPNs or switched to invite-only models, essentially removing themselves from the open web
  • Blocking major cloud providers: As mentioned earlier, SourceHut took the extreme step of blocking Google Cloud and Microsoft Azure entirely—cutting off legitimate users who happen to use these services

One small club website administrator described the situation bluntly: “These bots completely ignore robots.txt (we don’t want ANY indexing or crawling) and 99.9999% of traffic to the site is bots. And they are so pervasive that they caused my site to come to a crawl and often fail completely.”

This is the reality of running a website in 2025. Unless you have enterprise-level resources, you may find yourself making painful tradeoffs between availability, performance, and accessibility just to keep your site functioning at all.

The Content Creator’s AI Dilemma

As website owners in 2025, we face a fascinating paradox: Generative AI offers unprecedented tools to create, edit, and optimize content with remarkable efficiency. Yet the very same technology that empowers us is simultaneously threatening our digital infrastructure through aggressive AI crawlers that scrape content indiscriminately in a mad race to capture everything on the web as fast as possible, servers be damned.

The question isn’t whether to embrace AI—that ship has sailed.

The real question is: How do we benefit from AI’s creative potential while protecting our websites from becoming unwilling donors to every AI startup’s training data?

If you don’t absolutely need your content in every random AI startup’s training set, blocking these crawlers makes sense. But if you do want your content available for AI training, prepare to invest substantially in better website infrastructure—because the AI gold rush is taking a heavy toll on websites across the internet.

Don’t Let Bots Derail Your Digital Success

Your website is too important to leave vulnerable to the AI bot stampede. By taking proactive steps now, you protect not just your digital infrastructure, but your customer experience and business reputation.

Keep in mind, too, that this is not a set-it-and-forget it thing. You absolutely to monitor and evolve your approach over time, too.

With 20+ years of “zen learning” on the web, we understand these challenges at MODX. We’ve seen technology evolve—and we’ve stayed ahead of the curve. Our MODX Cloud hosting and selective content staging platform provides built-in protections against aggressive bots while delivering the speed, security, and staging workflow your business deserves, and our Open Source CMS has an unmatched security track record.

Ready to protect your website from the AI invasion? Contact our team today to discover how MODX Cloud can deliver faster, more secure sites that stand strong against even the most aggressive crawlers.


MODX is a leader in web content management, offering complete creative freedom, blazing-fast speed, and an unmatched security track record since 2004.