PORTLAND, Ore. Engineers have crafted an algorithm that appears to predict the outcome of the NCAA Men's Division I Basketball Championship better than the experts, the sports writers, the polls and other computer models can. Working from historical data, the algorithm posted an 83 percent accuracy rate over the last nine years of NCAA brackets. This year, it predicts a victory by Kansas. The algorithm is but one win away from being right. See all the algorithm's predictions here.
"Our system objectively measures each team's performance in every game it plays, and mathematically balances all of those outcomes to determine an overall ranking," said the algorithm's co-inventor, Joel Sokol. He wrote the algorithm with Paul Kvam and maintains it with George Nemhauser. All three are engineering professors at Georgia Tech.
The professors attribute the accuracy of their algorithm to its impartial, emotionless consideration of game results as well as its novel approaches to the home court advantage and close-game scores.
"Gut feelings about a team are really colored by how well or poorly they played the few times we've been watching," said Sokol.
Check the gut
Instead of gut feelings, the Georgia Tech algorithm, called Logistic Regression Markov Chain (LRMC), uses only scoreboard data, home court advantage and margin of victory. When considering home court advantage, its novelty consists in ranking how much playing at home helps a team win, rather than how much playing at home helps a team score points. Other computer models weigh points on a home court differently from those scored away from home.
The second novelty of the Georgia Tech algorithm is its handling of close games as toss-ups, since close scores involve more luck than skill and thus are a poor indicator of which team is better. Other computer models rank close games as wins of equal merit to blowouts. The Georgia Tech engineers, however, argue that losing a close game should not count against a team as much as losing by a landslide.
As a result, the Georgia Tech algorithm correctly picked all four of this year's finalists, and identified 30 of the last 36 Final Four participants as one of the top two teams in their region. During the same period, the seeds and polls correctly identified only 23 out of 36, and the ratings percentage index identified only 21 out of 36. (To be fair, of course, this year's No. 1 seeds all made it to the Final Four; this is a first, so the seeding committee did its job well.)
The Georgia Tech algorithm is also better at identifying overrated and underrated teams, according to the engineers. This year it identified Drake, Vanderbilt and Connecticut (all of which were upset in first rounds) as overrated. Second-round loser Georgetown was also pegged as overrated.
The Georgia Tech algorithm picked West Virginia (which defeated Duke) and Kansas State (which defeated highly ranked USC) as underrated.
On the downside, the LRMC algorithm mistakenly picked Clemson as underrated (the Tigers lost in the first round), and failed to identify Davidson as underrated. However, none of the other methodsincluding the AP poll of sportswriters, the ESPN/USA Today poll of coaches, the Ratings Percentage Index (RPI), the Massey Ratings and the Sagarin computer modelsuccessfully predicted Davidson's run to the Elite Eight either.