My Sports Betting Algorithm Might Be Bringing In Serious Dollars.
A deep dive into the inner-workings of a quantitative sports betting algorithm.
Finding new ways to make money in markets is a difficult task, so naturally, I like to explore other areas where I can use math and programming to turn a profit. Fortunately, sports betting has enough overlap with quantitative trading to make it a perfect fit.
I’ll first go over the prediction algorithm, what kinds of bets we can place and how they payoff, and last but not least, I’ll get down to the dollars and cents that matter most of all.
In order to setup the betting operation, I first needed something that would tell me what to bet. Coming from a quant finance background, I knew that I had to use data that would be useful in determining a particular outcome. But coming from a finance background, I knew nothing about sports, so I just went with basic stats to start with.
I want to put money on the line for a full season, and the next major betting sport is MLB Baseball which starts in April, giving me ample time to prepare the working prototype. To avoid overcomplicating things and just to get my feet wet, I decided to use the run-differential as my primary statistic.
The run-differential is calculated by subtracting runs scored from runs allowed. So, for example, if Team A gets 50 points scored against them the entire season, but they score 100 points against all the other teams for the season, their run differential is +50 (runs scored(100) - runs allowed(50)). Surprisingly, a given team’s run differential is strongly correlated with that teams win percentage:
So this sets the baseline theory for the model: If Team A has a higher run differential than Team B, bet that Team A will be the winner.
To see how this relationship holds up, I pulled data from 1998-2022 to see how often the run differential correctly predicted the winner. To make sure I was using relevant data only, for each match-up the run differential is calculated using the respective teams season data up to that game. For example, if Team A plays against Team B on June 7th 2022, I used data from March 31, 2022 (season start) - June 6 to generate the prediction.
Here’s how that turned out:
In a sample of 16,695 games the team with the higher run-differential won 58% of the time. So just using one statistic, we’re able to accurately predict the game winners more often than not on a long-run basis. Can we monetize that?
The primary way to bet on the winner of the game is through a bet known as the “moneyline”. Most American sportsbook operations quote odds that represent the payout off the bet. These odds symbolize the payout for every $100, so for example, odds of -115 mean that you have to pay $115 for a $100 payout, negative odds are known as the “favorite”. Odds of, for example, +110 means that for every $100 bet, you get back $110, positive odds are known as the “underdog”.
The sportsbook uses similar models, likely with more advanced statistics, but nevertheless, they have an estimate of who will win the game and price in the result accordingly. Sometimes, when there is a statistical matchup where the game winner seems obvious, the favorite may have odds that extend to -300. This means that you would need to bet $300 for a $100 profit.
As you can likely imagine, this leads to a scenario where even if you have a high win-rate, you always profit less than the principal. So you make $50 betting $100 4 times in a row, but lose twice and you’re back to where you started. When you lose a sports bet, you lose the entire principal unlike with stocks where you only lose a %, so odds are the most important thing.
Using data from June 2020 (earliest available via “TheOddsAPI”) to October 2022, this model correctly picked the game winner 61% of the time and bet with an average odd of -139. Despite this high win-rate, using a fixed betting size of $100 per game resulted in a loss of $9000. In reality, the loss wouldn’t be that large as we would stop betting when it goes broke the first time. But still, it goes broke.
However, there are other options.
Sportsbooks offer multiple types of bets for each game, the moneyline being just one of them. The next most common bet type is the point spread. The point spread is simply how many points a team beats the other team by. For example, if the favorite is Team A, a point spread of -1.5 means that Team A must beat Team B by 2 or more points. The opposite bet of this would be +1.5, in which case Team A must win (or lose) by 1 point or less.
Most often, the point spread bets are at -1.5 / +1.5 for baseball. The odds are almost always close to even, so you’d see odds of -110/+110.
Sportsbook also give you the option to make what are known as parlays. This is a combination of bets that results in a higher payoff. So for example, we can combine a moneyline bet that offers odds of -150 and a point spread bet that offers odds of -110, and the sportsbook will give us odds for that combined bet of +218. That calculation is a bit technical, so I will point you to this calculator/guide.
When we bet only that way, things start to look a lot better. Pictured are the results of taking bets that contain a game winner + point spread. As like the previous example, there was a fixed bet size of $100.
However, there is a catch. This is known as a correlated parlay. If there is a statistical mismatch, the winner is more likely to blow the other team out and win by more than two points. Most sportsbooks know this, so they don’t allow correlated bets. The sportsbooks that do accept these bets usually aren’t taking bets from people who know about correlated bets. And they aren’t called sportsbooks, but rather, “Bookies”. Let’s just say you probably don’t want to place action with one of those guys.
The main takeaway of this is that when accounting for the sportsbooks’ “vig”, “juice”, “margin”, or in other words, odds offered, there isn’t much edge in betting before the game happens. You can make a quite pretty penny with correlated parlays, but you’d be pressed to find a book that will continue to allow you to.
However, I do believe that there is edge to be captured from attacking this from a trader’s perspective. Sportsbooks don’t stop taking bets until a few minutes before the game ends, and often times the odds offered fluctuate up and down throughout the match. The next stage of the algorithm will have to be more active and look for mispricing opportunities as the game happens.
For example, if the statistical “star” player of a given team is fouled out mid-game, then the odds of the match will change significantly. This change may be a good time to enter, as the odds will be closer to even (-100/+100) considering that the outcome is much less clear. This solves the problem of paying too much on easy bets (<= -150 odds)
Thankfully, I armed with enough data to test and tweak this into a feasible version. MLB season will be starting soon, so the faster I build this out, the sooner I can get to putting real money on the line.
That is one heck of an incentive :)