I've done a little Java and understand the concept of classes and the whole OO thing, however i've not previously had a specific project to keep me interested. To cut a long story short, i need to do some web scraping in order to gather some info from several websites. I'd like to use some "native" Java functions and i'm not keen on using any 3rd party libraries because Java is already very bloaty. What i'm asking for is some advice and/or source code on the most simple and efficient way to:
- download the HTML source from a given URL.
- parse the HTML. In VB6, i wrote a simple function like this: GetSubstring(tag1, tag2) which returned the string in between the given tags/strings
Thanks in advance.
