Remove this ad
avatar

birchy

Betfair Elite

Posts: 591 Member Since: May 11, 2008

Lead

August 28, 2008 15:48:46

Tags : :

Now that i'm a full-blown Linux user, all my old VB6 code has become semi-obsolete. Not that it's a major problem, because after several years of botting against betfair, i have finally accepted defeat (sort of). On the plus side, moving to Linux has given me that extra push to learn C/C++ and Java, something i've been meaning to do for a couple of years but never bothered with because coding whatever i wanted in VB6 was always the easy option.

I've done a little Java and understand the concept of classes and the whole OO thing, however i've not previously had a specific project to keep me interested. To cut a long story short, i need to do some web scraping in order to gather some info from several websites. I'd like to use some "native" Java functions and i'm not keen on using any 3rd party libraries because Java is already very bloaty. What i'm asking for is some advice and/or source code on the most simple and efficient way to:
  1. download the HTML source from a given URL.
  2. parse the HTML. In VB6, i wrote a simple function like this: GetSubstring(tag1, tag2) which returned the string in between the given tags/strings
I'm also interested in C++ code to achieve the same. I'm using Eclipse for Java and Code::Blocks for my C++ programming.

Thanks in advance.
Quote    Reply   
Remove this ad
Remove this ad
avatar

myrddin

bot addict

Posts: 56 Member Since:June 2, 2008

#1 [url]

August 28, 2008 17:07:06

Can't help with the C++/Java scraping, sorry, but I just wondered how you were managing with online poker under linux. I remember you were keen on the betfair site a month or so ago; have you had any luck running their software under linux?

Quote    Reply   
avatar

nadat

rookie botter

Posts: 30 Member Since:June 26, 2008

#2 [url]

August 29, 2008 03:22:50

This should get you started in Java:
http://parthian-shot.blogspot.com/2007/09/html-screen-scraping-easy-way.html

If you don't really want to use an HTML parsing library (what'll you save, a few hundred KB?) get a string from the connection's input stream - everything up to that point is 'native'.

If you change your mind and decide to use someone else's parsing code, there are plenty of options:
http://java-source.net/open-source/html-parsers

And, if you need cookie handling:
http://blogs.sun.com/CoreJavaTechTips/entry/cookie_handling_in_java_se

Quote    Reply   
avatar

LeforaGuest

Posts: 0 Member Since:May 29, 2017

#5 [url]

September 9, 2008 20:28:39

Looks that way doesn't it. Without digging around I might guess that it relies on Javascript to redirect the post to a secure address.
Why screen scrape though? The Betdaq API is free.

Quote    Reply   
Remove this ad
avatar

austinpodhorzer

bot addict

Posts: 131 Member Since:May 11, 2008

#6 [url]

September 9, 2008 20:31:09

Hmm ... well I surfed via a web proxy and it seemed to send my username and password over http ... plain for all to see.

scraping still has it's occassional uses...

Quote    Reply   
avatar

birchy

Betfair Elite

Posts: 591 Member Since:May 11, 2008

#7 [url]

September 9, 2008 22:19:37

Dunno what the situation is now, but several months ago i decided that if i was gonna go scraping again, i'd go to WBX because of the cheaper commission. I'm getting more fun out of poker at the moment. I actually feel in control of my own odds and the 30% rakeback makes it a whole lot sweeter.

www.bespokebots.com

"This time next year Rodney, we'll be millionaires!"

Quote    Reply   
Add Reply

Quick Reply

bbcode help