Marcel Projects The 2009 Leader Boards
Colin Wyers has posted a set of Marcel projections, and well, I just had to slice and dice them. Marcel is Tom Tango's extremely basic forecasting system named after the monkey from Friends. It sets the bar for what other, more professional forecasting systems should strive to surpass. The system consists of:
- A weighted average of the past three seasons stats.
- An age adjustment.
- Some regression.
That's it. No park-adjustments. No similarity scores. No DIPS analysis. No lineup analysis. No nothing. Even so, Marcel doesn't finish far behind more advanced systems like CHONE and PECOTA in year-end analyses.
Now on to the fun. Here's how Colin's Marcels foresee the 2009 MLB leader boards in a number of popular hitting and pitching categories:
| Player | HR |
| Howard | 37 |
| Dunn | 30 |
| Arod | 29 |
| Pujols | 29 |
| Fielder | 29 |
| Braun | 29 |
| Pena | 28 |
| Dye | 27 |
| Thome | 27 |
| Teixeira | 26 |
| Delgado | 26 |
| Manny | 26 |
| Player | SB |
| Reyes | 47 |
| Taveras | 35 |
| HanRam | 32 |
| Ellsbury | 32 |
| Bourne | 31 |
| Figgins | 31 |
| Pierre | 31 |
| Roberts | 29 |
| Crawford | 29 |
| Suzuki | 28 |
| Player | wOBA |
| Pujols | .438 |
| Jones | .412 |
| Ortiz | .407 |
| Arod | .407 |
| Holliday | .403 |
| Manny | .401 |
| Wright | .400 |
| Cabrera | .400 |
| Berkman | .399 |
| Howard | .396 |
| Player | PA |
| Reyes | 581 |
| Suzuki | 574 |
| Sizemore | 572 |
| Wright | 567 |
| Cabrera | 565 |
| Pedroia | 563 |
| Morneau | 556 |
| Young | 554 |
| Beltran | 553 |
| Utley | 553 |
| Iwamura | 553 |
| Ibanez | 553 |
| Player | AVG |
| Pujos | .342 |
| Jones | .322 |
| Holliday | .320 |
| Mauer | .320 |
| Cabrera | .319 |
| Guerrero | .316 |
| Suzuki | .315 |
| Wright | .312 |
| Ordonez | .311 |
| HanRam | .311 |
| Pedroia | .311 |
| Player | OBP |
| Bonds | .470 |
| Pujols | .467 |
| Jones | .428 |
| Helton | .426 |
| Manny | .419 |
| Mauer | .419 |
| Njohnson | .418 |
| Berkman | .417 |
| Ortiz | .413 |
| Cabrera | .410 |
| Player | SLG |
| Pujols | .625 |
| Howard | .582 |
| Ortiz | .572 |
| Arod | .564 |
| Braun | .562 |
| Cabrera | .555 |
| Holliday | .552 |
| Jones | .551 |
| Manny | .548 |
| Fielder | .544 |
| Player | ISO |
| Howard | .301 |
| Pujols | .283 |
| Ortiz | .279 |
| Dunn | .268 |
| Arod | .265 |
| Braun | .261 |
| Pena | .258 |
| Fielder | .255 |
| Soriano | .249 |
| Player | RAA |
| Pujols | 33 |
| Cabrera | 18 |
| Howard | 18 |
| Holliday | 18 |
| Utley | 17 |
| Arod | 17 |
| HanRam | 17 |
| Jones | 16 |
| Wright | 16 |
| Fielder | 15 |
| Manny | 15 |
| Braun | 15 |
| Pitcher | IP |
| Sabathia | 187 |
| Halladay | 182 |
| Johan | 177 |
| Hamels | 174 |
| Webb | 173 |
| Lincecum | 172 |
| Lee | 172 |
| Burnett | 170 |
| Ervin | 170 |
| Pitcher | ERA |
| Lincecum | 3.61 |
| Harden | 3.70 |
| Halladay | 3.78 |
| Sabathia | 3.79 |
| Peavy | 3.80 |
| Webb | 3.81 |
| Johan | 3.82 |
| Hamels | 3.91 |
| Cain | 3.94 |
| Haren | 3.97 |
| Pitcher | SO |
| Lincecum | 165 |
| Sabathia | 160 |
| Johan | 154 |
| Burnett | 150 |
| Hamels | 148 |
| Ervin | 141 |
| Cain | 139 |
| Haren | 139 |
| Vazquez | 139 |
| Billingsley | 137 |
0 recs |
25 comments
|
Comments
Interesting
Does Marcel factor in the probability of a player strike/lockout? Most of these numbers seem low by 20% or so.
by Eric Simon on Nov 8, 2008 12:36 PM EST reply actions 0 recs
In other words
these are kind of useless.
by Daniel Berlyn on Nov 8, 2008 12:52 PM EST up reply actions 0 recs
In other words, they are heavily regressed.
Marcel isn’t all that smart about playing time. It doesn’t know when players switch teams, get injured, get benched, or whatever. All it knows is how much a player has played in the past. And given only that input, it maximizes its accuracy by heavily regressing playing time.
When I have time later I’ll find last year’s Marcels and post the leaders for PAs, HRs, etc. I think we’d all be surprised how much the sum of predicted totals of the top ten leaders in PAs matches their actual total of PAs (probably 2/3 of them would have 100 PAs higher with 1/3 at 200 PAs lower or something like that).
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 8, 2008 3:39 PM EST up reply actions 0 recs
Interesting. From Tango's 2008 Marcels, here are the projected PA leaders for this past season:
665 Rollins
653 Reyes
649 Sizemore
643 Suzuki
640 Pierre
632 Uggla
629 Zimmerman
628 Jeter
623 Holliday
623 Gonzalez
And the 2008 projected IP leaders:
202 Webb
200 Sabathia
199 Harang
195 Halladay
194 Lackey
194 Haren
194 Hudson
194 Blanton
193 Santana
192 Peavy
Those are MUCH higher than Colin’s Marcels’ PA and IP leaders for 2009, so I’m suspecting there’s something weird with what Colin did. I’ll bring it to his attention.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 9, 2008 11:24 AM EST up reply actions 0 recs
There's a bug in the PA/IP code.
Or rather there was – it was fixed in a later version and I simply didn’t publish those yet. I’m still revising the Marcels code that I have – all rate stats should be unaffected. I’ve been putting off publishing revisions until the Baseball Databank gets released, but since these are getting as much play as they are I’ll see about publishing the revised version later this afternoon.
by cwyers on Nov 9, 2008 2:03 PM EST up reply actions 0 recs
Thanks Colin, I'd love to use revides IP numbers for an article tomorrow.
Do you mind letting us know when they’re updated?
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 9, 2008 3:40 PM EST up reply actions 0 recs
Everything's updated.
Also, real names included.
by cwyers on Nov 9, 2008 9:16 PM EST up reply actions 0 recs
Bonds atop the OBP leaderboard? someone sign that guy, quick!
'That's something we do...thirteen hits and not score'-Terrence Long
by DyeLongJustice on Nov 8, 2008 2:16 PM EST reply actions 0 recs
What Terrence Long Said
Though how he does that without appearing on any other list is left as an exercise.
by klhoughton on Nov 8, 2008 11:33 PM EST up reply actions 0 recs
Since Bonds missed all of 2008...
…he is further regressed to the mean than the other Big Damn Sluggers on the list. I don’t know if that’s necessarily correct in Bonds’ case, but it’s such an unusual situation that I don’t think it matters for the vast majority of players.
by cwyers on Nov 9, 2008 2:01 PM EST up reply actions 0 recs
You just mean further regressed because he doesn't have 2008 data to "pull" his projection up from league average, right?
His great seasons are weighted 4 and 3 without any PAs weighted at 5, right?
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 9, 2008 3:41 PM EST up reply actions 0 recs
Right.
Here’s how Marcels regression works:
x/(x+214)
Where x is number of PAs in sample (in this case weighted PAs). Instead of 5/4/3 weights I use equivelent fractions, like 1/.8/.6. Keeps things a lot tidier for me. (It all works out the same in the wash.)
So for a player with 300 PAs in sample, you get:
300/(300+214) = .58
Which means that he gets regressed to the mean 42% (1-.58). For a player with 1600 in-sample PAs (a full-time starter, in other words), you get:
1600/(1600+214) = .88
So only 22% regression to the mean.
That is, in fact, reflected the in “R” column in the spreadsheet – that lets you know exactly how far each player’s stats were regressed. Bonds missed a full season, and the one that’s weighted the most heavily, so he gets regressed more than someone like, say, Pujols.
Marcels more drastically regresses playing time, and it only uses two years of data, like so:
.5 * PA1 + .1* PA2 + 200
Where PA1 is PAs in 2008 and PA2 is PAs in 2007. That really lowers the playing time forecast for someone who didn’t play at all in 2008.
[As a final aside – any forecasting system is likely to do better with projecting rates of performance than projecting playing time. The optimal approach involves combining the rate stats from a projection system with a better projection of playing time based upon depth charts and such.]
by cwyers on Nov 9, 2008 9:26 PM EST up reply actions 0 recs
That's phenomenal, thanks Colin.
Next thing you know, everyone will be spitting out projections willy-nilly. We can only hope…
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 9, 2008 9:58 PM EST up reply actions 0 recs
Is it easy enough to download the spreadsheets from
“baseball databank?”
BTW, to either of you (or anyone) know of or have a generic spreadsheet “plug in” where you can simply input the last three seasons of data and the players age and get a quasi-Marcels projection? The Marcels “in-season projector” from THT doesn’t quite do the trick at this point, and the “CAIRO” one I found includes all sorts of other data, unless I’m doing it wrong.
Or, I could just go with my generic 5-4-3 or 5-4-3-2 (with some league average regression optional) pseudo-projections using linear weights type stats (since I’m going for WAR anyway, and they wrap up rate and playing time data all in one — which works OK for veterans who have been playing full-time, not so well for part-timers or younger guys).
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on Nov 9, 2008 10:24 PM EST up reply actions 0 recs
I'm not familiar with BDB, but I assume it's straight forward.
The in-season projector will get the rate stats correct, but the playing time needs to be done separately. As Colin showed, it’s pretty straight forward.
A basic plug and chug projector would be nifty, but why not just use Colin’s Marcels which are done for you already?
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 9, 2008 10:39 PM EST up reply actions 0 recs
Yeah, i got'em again
I would just like to be able to do it myself, for fun.
Nice job, though, Colin. I take it that you generate the RAA stuff by getting the average from the projections and going from there?
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on Nov 9, 2008 10:59 PM EST up reply actions 0 recs
Sorry, another braa question
It’s just
(wOBA – lgwOBA)*PA
Or is it
[(wOBA – lgwOBA)/1.15]*PA
and then to add in SB/CS to that figure, you do bRAA + SB*0.17 -CS*.033? Or have you already added SB/CS into the wOBA figure?
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on Nov 9, 2008 11:04 PM EST up reply actions 0 recs
sorrry
I take it that “RAA” and “SB_RAA” can be added together to get the total projected “linear weight” run production of the player?
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on Nov 9, 2008 11:09 PM EST up reply actions 0 recs
If you’re using wOBA, [(wOBA – lgwOBA)/1.15]*PA is the correct way to figure RAA. I figured RAA (and SB_RAA) seperately, using my reference set of LWTS from 1993-2007.
by cwyers on Nov 10, 2008 10:58 AM EST up reply actions 0 recs
Sal's Marcels spreadsheet does it pretty much right...
…except for playing time. It’s an easy fix. Open up the hitter’s spreadsheet, go to Cell C13, and change the formula in there to:
=0.5*C12+0.1*C11+200
There’s no similar quick fix for the pitchers’ spreadsheet, unfortunately.
I learned much of what I needed to know for my Marcels from that spreadsheet. A person wanting to know more about projections systems could do worse than to poke around in that and try to figure out how it works.
The Baseball Databank itself is available either in CSV files or as a MySQL database. Excel can import CSV files.
My eventual plan is to (re)publish my Marcels code, along with documentation and a tutorial. That way, everyone has the full source to a basic projection system.
by cwyers on Nov 10, 2008 10:57 AM EST up reply actions 0 recs
awesome
that’s a great, great idea… thanks so much for all of your work on this. It really is a nice service to everyone, especially lazy dumbasses like me.
OMG Banny. FWIW I am only crdtng u w/3 runs allwd bc of DDJ OMFG
by devil_fingers on Nov 10, 2008 12:14 PM EST up reply actions 0 recs
Doesn't predict a single pitcher will get to 200 IP...
IDK ’bout that one
by staplemaniac on Nov 8, 2008 5:39 PM EST reply actions 0 recs
English Translation: Assumes the Giants are sane
Though it seems even more unlikely that someone will overwork Sabathia on signing him to a long-term contract.
(Mets management, otoh, will abuse Johan. They’ll have to if Reyes is putting up those hitting numbers and fielding the way he appears to have this year.)
by klhoughton on Nov 8, 2008 11:35 PM EST up reply actions 0 recs
I'm motivated enough to write an article on projecting/regression, even though I'm not expert.
You cannot read that IP leader board as saying “no pitcher will throw 200IP in 2009.” You should read it as “No individual pitcher is expected to pitch 200IP in 2009.” SOMEONE (probably many pitchers) will do it, we just don’t know exactly who it is.
Here’s another example. Would you bet on the Miami Dolphins at 50/50 odds to win the SuperBowl. No. How about the Giants? Better idea, but no. The Patriots? Giants? Redskins? No, no, no. In fact, there’s no team with a better than 50% chance of winning the SuperBowl. Does that mean you think NOBODY is going to win the SuperBowl? Uh, obviously not.
(To be technical, the Marcel projections are means, not medians, but the point gets across.)
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 9, 2008 9:24 AM EST up reply actions 0 recs
Also, again, the Marcels don't know ANYTHING about playing time other than historical playing time.
They don’t know about injuries, organizational philosophies, pitch counts, rotation depth, bullpen depth, etc. Therefore, to maximize accuracy across the board, heavy regression is applied. Other systems, especially ones who assign playing time “manually” do much better, and probably WOULD project Sabathia at 200 IP. But you’d also be surprised how heavily regress (i.e. conservative) those projections are. Just not quite as much.
Beyond the Boxscore // Calling BJ Upton lazy is lazy.
by Sky Kalkman on Nov 9, 2008 9:26 AM EST up reply actions 0 recs

by 











BtB on Facebook














