Remove this ad
avatar

guiness

rookie botter

Posts: 39 Member Since: August 28, 2012

Lead

October 28, 2013 22:41:30

Tags : :

Hi guys,

I've found some nice data in the Timeform website (thanks to a member here pointing me to it)

Timeform are offering a free 1 month API trial.

Has anyone written a wrapper in Python to access this data? Or have any experience of this? It would save a big effort learning JSON


Thanks Kindly
Quote    Reply   
Remove this ad
Remove this ad
avatar

cran

bot addict

Posts: 72 Member Since:June 11, 2013

#1 [url]

October 31, 2013 07:35:57

Does the API get you anything you can't get by scrapping?

I had a look at the API spec a while back and (only scanning quickly) didn't see anything that isn't already available from scraping the web page that would make it worth the effort of coding.

But If there is a Python wrapper already I'd also be interested in trying it out.

Quote    Reply   
avatar

guiness

rookie botter

Posts: 39 Member Since:August 28, 2012

#2 [url]

November 15, 2013 18:15:49

Hi, good point.

There was alot of data in Timeform example files that wasnt on the website  eg jockey info.. but I figured most of the data I didn't actually need.

The key data I want is on the pages that can be scraped.

Does anyone have a code in Python they have written to scrape between any two dates?  if not no worries I'll write my own and can give it out.. just would save reinventing the wheel.



 

Quote    Reply   
avatar

cran

bot addict

Posts: 72 Member Since:June 11, 2013

#3 [url]

November 15, 2013 18:38:59

My original scraping for loading a database was done with C# and it broke when they changed the page layouts a year or more ago, and as I wasn't doing much with it I never got round to rewriting it.

All I have for the current pages is a bot (in Python) that scrapes the TF results pages for working out the Naps scores on the GT forum. It does involve dates.  I'll take a look over the weekend and see if it has anything useful that's worth uploading.

Uses a mix of BeautifulSoup and Re if my memory is correct.

Quote    Reply   
avatar

birchy

Betfair Elite

Posts: 591 Member Since:May 11, 2008

#4 [url]

November 15, 2013 21:20:58

There used to be a thread on the BDP forum written by one of the betfair forum admins. It was a step by step guide on how to scrape the Timeform pages using .Net (C# I think?). Does it have to be Timeform or can you access the same data from the Sporting Life, Racing Post, etc websites?

www.bespokebots.com

"This time next year Rodney, we'll be millionaires!"

Quote    Reply   
Remove this ad
avatar

cran

bot addict

Posts: 72 Member Since:June 11, 2013

#5 [url]

November 16, 2013 11:19:16

Here's my Python code that scrapes the Timeform Results.

Uses BeautifulSoup (BS does most things but the Python HTML parser doesn't always work correctly, so have used re for some bits).

It's specific to the results data that I need, but shows it's quite easy to scrape with Python:

import sys
import re
import difflib
import mysql.connector
import requests
import hashlib

from datetime      import datetime, timedelta
from calendar      import monthrange
from bs4           import BeautifulSoup
from decimal       import *
from adn_functions import *

...

    def get_race_results(self, race_date):
    
        meetings     = [] # meetings for the day
        race_results = [] # working results list for processing each race
        results      = [] # full results list returned from function
        
        tf_url       = 'http://form.horseracing.betfair.com/'
        
        tf_page      = 'daypage?date=' + race_date.strftime('%Y%m%d')
        soup         = BeautifulSoup(requests.get(tf_url + tf_page).content)
        
        meeting_urls = soup.find_all(attrs={'data-location': 'RACING_COUNTRY_GB_IE'})
        
        for meet in meeting_urls:
            
            for href in meet.find_all('a', class_='course-name'):
                try:
                    track_name = str(href.string).replace('r','').lstrip().rstrip()
                    track_url  = str(href['href']).replace('r','').lstrip().rstrip()
                    meetings.append([track_name, track_url])
                except:
                    pass
                
        for meet in meetings:
            
            soup  = BeautifulSoup(requests.get(tf_url + meet[1]).content)
            races = soup.div.find_all(class_='courseschedule-submodule')
            
            for race in races:
                
                race_time = str(race.find('abbr', class_='dtstart').string)
                race_str  = str(race).upper()  # Python 3 html parser is crap so use Regular Expressions for handicap and runners
                handicap  = re.search('HANDICAP)', race_str) != None
                runners   = re.search("(RUNNERS: )(d*)?( )?", race_str)
                
                if runners:
                    runners = int(runners.group(2))
                else:
                    runners = 0
                    
                if (runners >= 16 and handicap):
                    places = 4
                elif (runners >= 8):
                    places = 3
                elif (runners >= 3):
                    places = 2
                else:
                    places = 1
                             
                result = ([ meet[0], race_date, race_time, runners, places, [], [], [], [], handicap ])
    
                try:
                    horses = race.find('tbody').find_all('tr')
                except:
                    horses = []
               
                for horse in horses:
                    
                    try:
                        
                        pos_str     = str(horse.find('span', class_='pos').string)
                        horse_tf    = str(horse.find('td',   class_='horse').find('a').string)
    
                        try:
                            bsp     = float(str(horse.find('span', class_='bsp').string)) + 0.0001 # + 0.0001 to handle float rounding errors
                        except:
                            bsp     = 1.0
                        try:    
                            place   = float(str(horse.find('td', class_='place').string)) + 0.0001 # + 0.0001 to handle float rounding errors
                        except:
                            place   = 1.0
                        try:
                            int_pos = int(pos_str)
                        except:
                            int_pos = 99
                            
                        if (int_pos == 1):        # WIN
                            
                            W   = bsp   - 1.0
                            P   = place - 1.0
                            ewW = W * 0.5
                            ewP = P * 0.5
            
                            result[5].append([pos_str, horse_tf, bsp, place, W, P, ewW, ewP, 0, 0])
                            
                        elif (int_pos <= places): # PLACE
            
                            W   = -1.0
                            P   = place - 1.0
                            ewW = -0.5
                            ewP = P * 0.5
            
                            result[6].append([pos_str, horse_tf, bsp, place, W, P, ewW, ewP, 0, 0])
            
                        else:                     # UNPLACED
            
                            W   = -1.0
                            P   = -1.0
                            ewW = -0.5
                            ewP = -0.5
                            
                            result[7].append([pos_str, horse_tf, bsp, place, W, P, ewW, ewP, 0, 0])
                
                    except:
                        
                        pass # no results available (yet) on TF website
    
                try:
                    
                    non_runners = race.find('ul', class_='non-runner-list').find_all('a')
                
                    for nr in non_runners:
                
                        position   = 'NR'
                        horse_tf = str(nr.string)
                    
                        result[8].append(['NR', horse_tf, bsp, place, 0.0, 0.0, 0.0, 0.0, 0, 0])
                        
                except:
                    pass
                
                # DEAD HEATS - http://en.learning.betfair.com/app/answers/detail/a_id/2516/~/what-happens-if-there-is-a-dead-heat%3F
                
                num_winners = len(result[5])
                num_placed  = len(result[6])
                
                if num_winners > 1: # Win Dead Heat
                    
                    for horse in result[5]:
                        win_odds = horse[2]
                        horse[4] = ((1/num_winners)*(win_odds-1))-((1*(num_winners-1))/num_winners) # win p/l
                        horse[6] = horse[4] * 0.5                                                   # e/w win p/l
                        horse[8] = 1                                                                # dead_heat flag
                    
                if num_winners > places: # More Winners than places so the place bet payouts for winners also need adjusting
                    
                    for horse in result[5]:
                        place_odds = horse[3]
                        horse[5] = ((1*places/num_winners)*(place_odds-1))-(1*(num_winners-places)/num_winners)  # place p/l
                        horse[7] = horse[5] * 0.5                                                                # e/w place p/l
                        horse[8] = 1                                                                             # dead_heat flag
                    
                elif (num_winners + num_placed) > places: # Place Dead Heat
                    
                    position = int(result[6][num_placed-1][0])          # position of last placed horse is the position that is tied
                    tied     = num_winners + num_placed + 1 - position  # number of horses tied for this position
                    payouts  = places + 1 - position                    # number of payouts e.g. 3rd place in a 4 place market would have 2 payout slots remaining (4+1-3)
                            
                    for horse in result[6]:
                        
                        if int(horse[0]) == position:
                            place_odds = horse[3]
                            horse[5] = ((1*payouts/tied)*(place_odds-1))-(1*(tied-payouts)/tied)    # place p/l
                            horse[7] = horse[5] * 0.5                                               # e/w place p/l
                            horse[8] = 1                                                            # dead_heat flag
    
                race_results.append(result)
           
        # compile complete results list for the day
        
        for result in race_results:
            
            for horse in result[5]: # winners
                results.append(self.__build_dict_row(result, horse))
            for horse in result[6]: # placers
                results.append(self.__build_dict_row(result, horse))
            for horse in result[7]: # losers
                results.append(self.__build_dict_row(result, horse))
            for horse in result[8]: # non runners
                results.append(self.__build_dict_row(result, horse))
                
        return results

Quote    Reply   
avatar

guiness

rookie botter

Posts: 39 Member Since:August 28, 2012

#6 [url]

November 18, 2013 12:39:33

hi cran, thanks very much, just tried to compile this.. 'No module named adn_functions' is this one of yours?!

Quote    Reply   
avatar

cran

bot addict

Posts: 72 Member Since:June 11, 2013

#7 [url]

November 18, 2013 18:59:54

Yyeah, there's also functions it calls like __build_dict_row() that it needs and an sql database...Sorry, wasn't meant to be a compilable program to run, just an example of the code image

If I get some time on Thursday/Friday I'll convert it into something simple that runs.
 

Quote    Reply   
Remove this ad
Add Reply

Quick Reply

bbcode help