Lefora Free Forum
257 views

scraping prevention

Page 1
posts 1–10 of 10
novice - member
15 posts

Some sites, including Betfair, appear to be employing some means to prevent scraping.

When I scrape the prices from Betfair once per second, they only update after five seconds. On another site, dynamic data appears static when scraped.

Anybody know how they do this and how we get round it?


regular - member
134 posts

If you wish to scrape live betfair prices then you need to emulate a normal browser login. The login requirement was added when data charges were brought in. You have to login to identify yourself so that your number of requests can be correctly tallied.

If a site behaves differently when scraped then your browser emulation probably isn't convincing enough. Check that you're sending the right headers and handling cookies properly.

novice - member
15 posts

Thanks, that explains Betfair, but with regard to the other site, my code works fine on another PC, but not on my own. And it's unlikely that my PC has been blocked in any way.

regular - member
134 posts

"dynamic data appears static" sounds more like a cacheing issue. Could be caused by many factors. Try adding anti-cacheing headers. If you are adding time based parameters to the request then check the time/time zone etc. For more clues you'd need to compare the full exchange of headers over several requests, using browser / your code / your code on other pc.

novice - member
15 posts

Not sure what the problem is, but by stopping and restarting my program, it somehow retrieves dynamic data, but that's impractical.

A workaround I found is to include a random number in the url. That'll do me until I find something better.


superstar - member
230 posts

That's definitely a cacheing problem. What language/libraries are you using? Is the other PC running the same operating system?

You're not automating a web browser component are you?

__________________
novice - member
34 posts

I'd agree with birchy I used to have problems like that when using flash+php to scrape sites and the prog was just returning cached data unless I was adding random numbers to the url to make the returned data unique.

novice - member
15 posts

I'm using Visual Basic Express and also MS Access on Vista. PC where it works has XP.

I've tried ActiveX Data Objects, MS HTML, MS XML 6, WinHTTP 5.

Not web browser component.

Data refreshes okay with IE.


superstar - member
230 posts

Are you using the "Pragma: no-cache" header? If not, try it.
I'm no longer scraping betfair, but i remember that i always used the main loader url and not the one they use for refreshing:

http://uk.site.sports.betfair.com/betting/api/json/getBootstrapData.do?mi=xxxxx

Also note that i never used any of the url parameters. And do make sure that you're not using the "prevcache" parameter...although that should be obvious...

__________________
novice - member
15 posts

.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "Cache-Control", "no-cache"


Page 1
posts 1–10 of 10

This Topic Is Locked To Guest Posts

It's been a while since this topic was active, if you'd like to get it going again, please post as a registered member

join now