It sounds lime some kind of rate limiting. There are several apache modules that will do that, mod_qos
being one of them. Usually they will return short body in addition to 403 forbidden
code with little more details, like link to Acceptable Usage Policy or similar. See if your scraper script could show you that in addition to 403 error code. There could also be reverse-proxy before apache doing the limiting.
As a solution, keep your number of concurrent request to same site low (if not 1). And obey robots.txt
! Also make note of destination site Terms of Service and Acceptable Usage Policy .
UPDATE yes, mod_evasive
will also do that. You can disable it (if you have control over site) or tune its parameters. Specifically in your case, it will block if you do more than DOSSiteCount
requests in DOSSiteInterval
time. So you need to increase allowed number of requests or throttle your fetching speed (by limiting download parallelism and/or inserting delay after each one)