A perfect NCAA bracket? NDSU statisticians use data to predict mens basketball tournament winners
FARGO — While March Madness may be a gambler’s paradise with 63 games in less than three weeks, the NCAA men’s basketball tournament is also a golden opportunity for statisticians to develop and test their models.
For the last three years, the statistics department at North Dakota State University has experimented with models to predict the outcome of tournament games based solely on data and probability, said Rhonda Magel, the department chairwoman.
This year, Magel, assistant professor of statistics Gang Shen, and students Yingfei Mu and Bryan Rask developed a model and created a bracket based on their predictions. It’s on display in Morrill Hall.
They factored in a wealth of information about the teams such as average score margins, average assists per game, assist-to-turnover ratio, strength of schedule and Pomeroy’s ratings.
For the Final Four, they’ve picked Florida, Virginia, Arizona and Louisville. There’s a 27 percent probability that Arizona wins the championship in their model.
Their Final Four predictions line up pretty well with famed statistician Nate Silver’s model, which shows Louisville having the greatest chance of winning the tournament, followed by Florida and Arizona.
While the odds of creating a perfect bracket are 9.2 quintillion to one, the number-crunchers at NDSU are aiming for 70 percent accuracy — no easy feat in the later rounds of the tournament, when the matchups are almost too close to call.
“Prediction accuracy using our model is between 60 and 70 percent,” said Mu, a graduate student who helped develop a model for the men’s tournament. “We haven’t found a model other than ours that has a prediction accuracy of more than 70 percent. I think it’s very difficult to get very accurate bracketing.”
In some of the later rounds of the tournament — such as the Elite Eight and Final Four — it’s almost 50-50. Magel said if Virginia and Florida meet in the penultimate round of the tournament, it’s down to the thousandths place, or a “very, very, very, very small difference” between the two teams.
Unlike President Barack Obama’s bracket, the NDSU model favors Oklahoma over NDSU in the first round. The probability of NDSU pulling an upset is one-third.
Magel said they called the game for Oklahoma strictly based on the numbers, but hope the Bison win.
“If I were filling out a bracket without factoring in our model, I would definitely put NDSU in there just because I am hopeful,” she said. “A one-third probability is still a probability, and that’s why it’s so hard to get every one of 63 games correct.”
And for March Madness, there are many things statistical modeling can’t possibly account for, such as when Louisville’s Kevin Ware broke his leg in the Elite Eight round of the tournament last year. The team still won over Duke and went on to win the tournament, but the dramatic injury could have affected team play, Magel said.
“Those things you just can’t predict,” she said.
While Mu said she’s more of an NBA fan, she said the NCAA tournament is a fun way to practice statistical modeling.
“It’s more fun because you can compare your prediction results with real results,” she said.
Though they’ve developed the model for sports, Magel said there are more applications in fields like medicine where the same kind of models can determine cancer survival rates.
“You can carry it on throughout various areas, but using it for sports is fun. There’s a lot of interest in the NCAA,” she said.