Download of entire web site

David White cavefish at pacbell.net
Sun Oct 6 01:38:49 PDT 2002


Has anyone on this list heard of web crawlers that download an entire
site - including 100s of megabytes of images and thousands of
dynamically generated pages?  It's almost like someone was trying to set
up a duplicate site.

The download was apparently by one computer - but it used 4 IP
addresses, each with a different useragent: Win95, Mac_PowerPC, Win2000,
and Konqueror.  The session cookie was the same for all 4 IP addresses,
suggesting that it was a single computer - but the different useragent
strings suggest that it was trying to make itself less conspicuous.
I've blocked the 4 addresses.

This doesn't seem like legitimate web crawler behavior.  Has anyone
encountered this before?  I'm worried that someone is trying to do
something bad - but so far I can't figure out what.

I'm sorry this isn't directly related to Linux - except that my server
is running Linux.

Thanks in advance,
David White
cavefish at pacbell.net
(I've been monitoring this list for years, but I can't remember whether
I've posted before.)




More information about the talk mailing list