My Blog: math - Algorithm to calculate the odds of a team winning a sports match given full history -

Friday, 15 January 2010

math - Algorithm to calculate the odds of a team winning a sports match given full history -

assumptions:

the teams never change the teams don't improve in skill the entire history of each team's performance against subset of other teams known the number of games played between teams large, potentially sparse (each team hasn't played each other team)

for example:

i have long list of match outcomes this:

team beats team b team b beats team team beats team b team c beats team team beats team c

problem:

predict right betting odds of team beating other team.

in illustration above, maybe conclude should beat b 66% of time. based off direct observation , pretty straightforward. however, finding probability c beats b seems harder. they've never played together, yet seems c > b, low confidence.

research i've done:

i've read fair bit different ranking systems games of skill, such elo , glicko rating systems chess. these fall short because create assumptions probability distributions involved. example, elo's central assumption chess performance of each player in each game distributed random variable. however, according wikipedia, there other distributions fit existing info better.

i don't want assume distribution. seems me 10,000+ match results on hand should able either deduce distribution evidence (i don't know how this), or utilize sort of reinforcement learning scheme doesn't care distribution is.

you want create best estimate of probability (or multiple probabilities) , continuously update estimate more info become available. calls bayesian inference! bayesian reasoning based on observation probability (distribution) of 2 things, , b, beingness case @ same time equal probability (distribution) of beingness case given b case times probability b case. in formula form:

p(a,b) = p(a|b)p(b)

and also

p(a,b) = p(b|a)p(a)

and hence

p(a|b)p(b) = p(b|a)p(a)

take p(b) other side , bayesian update rule:

p(a|b)' = p(b|a)p(a)/p(b)

usually stands whatever variable trying estimate (e.g. "team x beats team y") while b stands observations (e.g. total history of matches won , lost between teams). wrote prime (i.e. quote in p(a|b)') signify left hand of equation represents update of beliefs. create concrete, new estimate of probability team x beat team y, given observations far, probability of doing observations given previous estimate, times previous estimate, divided overall probability of seeing observations have seen (i.e. given no assumptions relative strength between teams; 1 team winning of time less both teams winning as often).

the p(a|b)' left hand of current update becomes new p(a) on right hand of next update. maintain repeating more info come in. typically, in order unbiased possible, start flat distribution p(a). on time p(a) become more , more certain, although algorithm able deal sudden changes of underlying probability you're trying estimate (e.g. if team x becomes much stronger because new player joins team).

the news bayesian inference works beta distribution elkamina mentioned. in fact 2 combined in artificial intelligence systems meant larn probability distribution. while beta distribution in still assumption, has advantage can take many forms (including flat , extremely spikey), there's relatively little reason concerned selection of distribution might affecting outcome.

one piece of bad news still need create assumptions, apart beta distribution. example, suppose have next variables:

a: team x beats team y

b: team y beats team z

c: team x beats team z

and have observations direct matches between x , y , matches between y , z not matches between x , z. simple (though naieve) way estimate p(c) assume transitivity:

p(c) = p(a)p(b)

regardless how sophisticated approach, you'll have define kind of construction of probabilities deal gaps interdependencies in data. whatever construction choose, assumption.

another piece of bad news approach plain complicated , cannot give total business relationship of how apply problem. given need construction of interdependent probabilities (probability of team x beating team y given other distributions involving teams x, y , z), may want utilize bayesian network or related analysis (for illustration markov random field or path analysis).

i hope helps. in case, sense free inquire clarifications.

algorithm math statistics game-theory

My Blog

Friday, 15 January 2010

math - Algorithm to calculate the odds of a team winning a sports match given full history -

No comments:

Post a Comment