Preventing Web Scraping
When
we have a full proxy between Internet and our LAN we can do
everything, even protect our servers, ;-) this is what a WAF does,
protecting against Web
Application Vulnerabilities, Web
Scraping or DoS
Attacks. This time, I want to write
about Web Scraping which is a technique to download automatically
the whole web site
for extracting competitor price tracking, email
addresses, directory listings for obtaining
leads and marketing information, search competitors' web sites for
images, financial information, or other product data, and
also for copying the web site for phishing attacks.
There
are many tools to extract data from websites for
cloning it or analysing it
like the simple cURL or Wget or another more advanced like HTTrack.
For instance, I used the Social-Engineer
Toolkit (SET) two summers ago in a
speech called “Innovation,
yes but with Security” for making
a PoC of Phishing
Attack where I copied the Gmail and
elpais.com websites.
Although
there are still few companies worried about this threat, they are
becoming more and more aware about protecting their public data for
competitive reasons. Next, we are going to
see some Web Scraping mitigation techniques to protect our websites.
Bot
detection
This
is a method where the preventing web scraping system applies several
checks for bot detection. For instance, a check for detecting rapid
surfing where counts how many different URLS the client has loaded
and unloaded from the application within a defined period. Another
check is to
ensure the client accepts cookies and processes JavaScript. And
another check could use JavaScript again to determine if the client
behaves like a human being or a bot.
Bot detection configuration in BIG-IP ASM |
Session
Anomaly detection
This
is a method for detecting clients who open a large number of new
sessions. One check is counting the new sessions per second rate and
another check is detecting a spike in the number of new sessions.
This method could also use the IP reputation database for detecting
malicious IP addresses which is an indicator as well for triggering a
violation.
Session Anomaly detection configuration in BIG-IP ASM |
Fingerprinting
This
is a method of collecting browser attributes to detect malicious
users. Some attributes are browser APIs like JavaScript API supported
by the browser, expressions, localization information from the
browser, fonts installed in the browser, screen parameters, time and
plugins.
Fingerprinting configuration in BIG-IP ASM |
Web
scraping was a concept unknown for me a year ago but preventing web
scraping today can be done and it's a fact for many organization who
are worried about their public information.
Regards
my friends, drop me a line with the first thing you are thinking!!!
Commentaires
Enregistrer un commentaire