Prevent Web Scraping by Applying the Pyramid of Pain

The Bots Pyramid of Pain: a framework for effective bot defense.

By Merlyn Albery-Speyer (additional contributions by Tafara MuwandiMalcolm HeathDavid Warburton)

March 28, 2025

4 min. read

The Content Scraping Problem

Bots scraping websites for their content is a problem that’s not going away, as we highlight in our 2025 Advanced Persistent Bots report. In this article we present a Pyramid of Pain specifically for defending websites and APIs against bots and automated attacks. The power of this framework is in understanding the value and effectiveness of different ways of detecting and blocking bots. The key concept, as pioneered by David J. Bianco,¹ is to assess how much pain a specific detection, response, or prevention mechanism would inflict on the bot operators.

Bots Pyramid of Pain

The pyramid in this framework consists of a number of layers, see Image 1. Each layer has a description along with the level of pain it inflicts the bot operator. The bottom later causes the bot operators the least pain, and the top layer the most.

An image of a "Pyramid of Pain" for defending websites and APIs against automated bots. The layers are described in greater detail in the article.

Blocking Based on Request Headers

At this lowest layer in the pyramid, we start by blocking requests based on the HTTP headers present (or absent) and their values. The most straight-forward of which being the User Agent header. This inflicts a trivial level of pain as bot operators need only provide plausibly valid headers and they are trivial to fake. Modifying or adding a header is a literal one-line code change for bot operators.

Blocking Based on IP Address

At this layer connections are blocked or rate-limit based on the IP address and/or any of its associated metadata (such as IP reputation, ASN, country, or geolocation). This poses no real challenge to bot operators. They can easily either “go low and slow” (i.e. send only a few requests per IP address per hour/day), or they can utilize a proxy network.

Blocking Based on Turing Capability

At this layer bots are explicitly challenged to prove they can solve puzzles only a human can solve (often referred to as CAPTCHAs). That’s the theory at least. In practice this mostly inflicts pain on actual humans. Bot operators will leverage either freely available software solutions or crime-as-a-service offerings to solve the puzzles. While this is operationally a simple level of pain to inflict, it does require the bot maintainer to have a basic level of resourcefulness and awareness of the crime ecosystem.

Block Based on Tools

At this layer, defenses attempt to detect the tooling, for example, Browser Automation Studio. If it’s possible to detect and block the use of that tool, bot operators are forced to reimplement their scraper using other tooling. The more tooling that can detected and blocked, the more the bot operators are forced to spend time and effort on specialized tooling.

Block Based on Browser & Device Fingerprint

At this layer the browser or client device is interrogated. On the assumption that sophisticated fingerprinting techniques are available, this inflicts challenging levels of pain. The key aspect of this pain is that while a bot maintainer can conceivably pass a single fingerprint check they need to do so in aggregate for large volumes of requests. While digital fingerprints can be purchased, generating 100s and 1000s of internally consistent fingerprints sufficient to blend in with legitimate user fingerprints is no small feat.

Block Based on Behavior

Defesnes at this layer are the most complex to implement, prohibitively so if this solution is being developed in-house. At this layer bots must be identified and blocked based on their behavioral patterns. This is the maximum pain that can be inflicted as it gets to the heart of the bot operators' objective. Forcing a bot maintainer to imitate legitimate user behavior is a very heavy tax indeed.

Anti-Scraping Measure Considerations

When it comes to applying anti-scraping measures, there are two choices: deploy a partial solution or a full solution. A partial solution would cover two or three of the bottom layers represented in the pyramid. In the early days of scraping automation, deploying today’s partial solutions would have been very effective. Unfortunately, this is no longer the case. For example, CAPTCHAs used to be very effective at blocking scrapers and are now simple to bypass.

If you deploy a partially effective anti-scraping measure you will see two main effects:

Less capable scrapers will leave you alone. These will be casual scrapers, or scrapers for whom the value of your content is relatively low. Unless scrapers find your content high value, you prevent a certain percentage of your scraper volume this way.
More capable scrapers will become less visible to you. This is the major downside of partially effective measures. In effect you force scrapers to become more effective at imitating legitimate traffic. This inherently means you lose visibility into the problem you are facing.

Applying Anti-Scraping Measures

Whatever course of action chosen, keep the Pyramid of Pain in mind. In general, avoid picking a solution that inflicts inconsequential pain to the scrapers targeting the application. It will do little to deter the scrapers, significant time and effort will have been, and the business will be left with a false impression of having mitigated the problem. At worst, visibility of bot traffic will be lost as bot operators retool and make their traffic appear more legitimate.

Conclusion

Content scraping by bots is an evolving challenge that demands increasingly sophisticated anti-scraping measures. The Pyramid of Pain framework provides a clear lens through which to understand the effectiveness of anti-scraping measures available to you.

When deciding what the right cost-benefit mix is for your situation, it’s critical to understand the key factors to influence your decision as well as their inherent risks. We contextually recommend a spectrum of actions to take from deferring action altogether through to deploying a full anti-scraping solution.

Vital Components

OWASP Automated Threats (OAT)

OAT-011 Scraping

OSI Layers:

Layer 7 - Application

Affected Tiers

Client

Services

Access

TLS

DNS

Network

Client Services Access TLS DNS Network

Authors & Contributors

Merlyn Albery-Speyer (Author)

Sr Cybersecurity Threat Researcher

All Articles

Tafara Muwandi (Contributor)

RVP of Data Science

All Articles

Malcolm Heath (Contributor)

Principal Threat Researcher

All Articles

David Warburton (Contributor)

Director, F5 Labs

All Articles

Footnotes

¹https://www.sans.org/tools/the-pyramid-of-pain/

TAGS: Web Scraping Bots and Automated Attacks Threats Layer 7 - Application Pyramid of Pain OAT-011 Scraping

Prevent Web Scraping by Applying the Pyramid of Pain

The Content Scraping Problem

Bots Pyramid of Pain

Blocking Based on Request Headers

Blocking Based on IP Address

Blocking Based on Turing Capability

Block Based on Tools

Block Based on Browser & Device Fingerprint

Block Based on Behavior

Anti-Scraping Measure Considerations

Applying Anti-Scraping Measures

Conclusion

OWASP Automated Threats (OAT)

OSI Layers:

Affected Tiers

Read More from F5 Labs

Research & Insights Featured On