Wednesday, June 6, 2007

And the winner is...

Economics provides an unique perspective on many ordinary things that occur in our lives. I wrote my college capstone on the Economics of Drinking in College. A very popular book called "Freakanomics" looks at things such as the decline of crime in the 1990s or cheating in sumo wrestling from an economic point of view. It isn't all about numbers and supply and demand. Sometimes the most fascinating research in the field involves things that no one really thinks about aside from the minds of those like Malcolm Gladwell. And sometimes economics can relate to sports.

Rebecca Sparks and David Abrahamson, from Rhode Island College, wrote an article a couple years back about predicting Cy Young Award winners. While it involves more math equation than anything, they place value on different pitching categories to determine who will win the award each season. And I wouldn't necessarily call it ground breaking either, but it is interesting.

Sparks and Abrahamson chose five categories for their study; wins, losses, strikeouts, ERA, and team win percentage all went into the equation. By looking at past winners, they found out which categories seem to be more valuable to the voters. Wins proved to be most valuable, followed by earned run average and strikeouts. Team winning percentage showed very minimal value and losses seemed insignificant to the voting. This study didn't represent what the authors thought should be the case, only what proved to be reality in voting for the Cy Young award.

As you can see, relief pitchers don't come into play with this study. Although it may not matter. Since 1990, only two pitchers won the award as relievers. Dennis Eckersley won the award in 1992 with Oakland and Eric Gagne won it in 2003 with Los Angeles. Both pitchers posted 50+ save seasons with under 2.00 ERAs, but that doesn't seem to be an end-all-be-all for the award. Gagne also posted 52 saves with a sub 2.00 ERA in 2002 and John Smoltz had 55 saves the same year. Neither won the award.

The equation looks as follows:

W = wins/3

L = 10(15-L/15)

K = 10 (K-50/333)

ERA = 12.5 - 2.5(ERA)

Win % = 20(TWP - 0.25)

Add those together for all the top pitchers, use some sort of math formula after that which I do not understand and you can predict the Cy Young winner. The researchers looked at both leagues from 1993 to 2002 and correctly predicted the top three in each league except for the 1995 American League winner. Also, you can look at just how good each individual pitching season really turned out to be based on the point values. According to this study, the two best seasons in that time were Randy Johnson in 2002 (7.73261) and Pedro Martinez in 1999 (7.54405).

2002 Randy Johnson 24-5, 2.32 ERA, 334 K
1999 Pedro Martinez 23-4, 2.07 ERA, 313 K

While I do a lot of boring, number crunching stuff to please myself in sports, I am not sure I would use this all the time. Am I really going to calculate the top pitchers in each league and see who will win the award? Well, yes, I probably would. But I don't have to!

Rob Neyer and Bill James (yes, the Red Sox numbers guy) created a similar equation to predict the Cy Young race and you can view it on ESPN.com anytime you wish. Their formula looks like this:

Cy Young Points (CYP) = ((5*IP/9)-ER) + (SO/12) + (SV*2.5) + Shutouts + ((W*6)-(L*2)) + VB (see below) Victory Bonus (VB): A 12-point bonus awarded for leading your team to the division champsionship (pro-rated based on the current standings).

See it's easy.

Anyway, looking at old standings from the Cy Young Predictor (as they call it), it is not as accurate as Sparks and Abrahamson. They correctly predicted Johan Santana to win the award last season in the A.L. but chose Billy Wagner in the N.L., where he actually finished 6th in voting. Brandon Webb won the award, but finished 3rd in the James-Neyer rankings.

2005 provided similar results as they correctly predicted Chris Carpenter in the N.L. but chose Mariano Rivera over Bartolo Colon in the A.L. The actual finish flipped the two pitchers. James and Neyer got Santana right also in 2004, but chose Eric Gagne in the N.L. where Roger Clemens finished 1st (3rd in the standings for Neyer/James). It appears that Neyer and James put too much value on the reliever in their formula. If they completely took closers out of their standings, they would have correctly predicted all Cy Young winners except one in the past three years.

Looking at this season then, the A.L. standings are led by John Lackey, C.C. Sabathia, Dan Haren, and Josh Beckett. The N.L. reads Jake Peavy, Francisco Cordero, Brad Penny, and Cole Hamels. From the look at the past three years, you have to dismiss Cordero. So as of now, John Lackey and Jake Peavey will win the awards. Peavey seems like an easy choice, but many would argue against Lackey.

Too bad I don't understand the rest of the Sparks and Abrahamson math or we could find out for sure.

Maybe you can do it. Look here for more information on both.

Sparks and Abrahamson

Cy Young Predictor

No comments: