Tuesday, April 10, 2007

Lack of Dominance in Chart Form

All the cool kids have started reproducing graphs from fangraphs showing win expectancy -- basically how an individual's actions affect the team's chances of winning. They're mildly interesting, and do give you a quick snapshot of the roller coaster of emotions that can occur in a game.

But something about them bothers me. In many cases, it seems like they're being attributed more value and worth than they really have. In most cases, the graph is simply stating the obvious, but because it's wrapped in numbers, it's supposedly superior to observation or it's enhancing analysis of the game. I don't quite see it. Every now and then, it does show a hidden play -- like a 6th-inning Jose Vidro GIDP -- but for the most part the key plays are the ones that even your baseball-hating gf could tell you were important. (Case in point -- you mean ARod's 9th inning Grand Slam was important???)

Now this isn't to say that I'm anti-stats, just that often what they're telling you is the same thing that anyone can see with their eyes. The example that sticks out in my mind is SBF from Nats320 talking about Ryan Zimmerman's defense last year (which I can't find on google, of course). SBF isn't much for stats, but he's got a good eye, having been around the game for a while. He said that Ryan Zimmerman was sensational going to his right or to his left, but that it was the shots hit right at him that he had a hard time fielding, especially liners. Well, Baseball Musings does a pretty sophisticated defensive analysis which tries to measure how well players do in certain situations relative to others. Here's what they found for how Zimmerman does with line drives. If it's hit right to him (the middle of the 'vector' axis), he does poorly, turning fewer of those balls into outs.

The numbers and the scouting eye say the exact same thing.

Now there are certainly cases where they won't match up. And the numbers give you the advantage of "being near" and "seeing" players the scouts can't get to. But there's a lot of overlap.

If I had an editor, this is where they'd urge me to hit ctrl+a then the delete key, because that's all throat clearing for what I wanted to get to: John Patterson.

We've seen his lack of dominance with our eyes. But do the numbers agree?

Thanks to the greatest invention since John Crapper and Willis Carrier, Baseball-Reference.com, we have access to pitch-by-pitch data. The results of each individual pitch can let us know how he's doing, and whether he's pitching better than the results seem to indicate he is.

If you want to follow along, here's Patterson's page. Just click on the "Pitch Data Summary" link right below his pitching stats.
Year   Strk%  1st%  StS%  StF%  StI%
2005 64% 59% 16% 30% 27%
2007 54% 39% 11% 24% 36%

Year SOc% Cntc% 3-0% 0-2%
2005 38% 78% 5% 20%
2007 25% 85% 11% 9%

What the hell are we looking at?
--Strk% is the percentage of strikes thrown. You can see that he's throwing 10% fewer, and has tossed almost as many balls as strikes.
--1st% is first-pitch strikes, and this is where he's really collapsed. He's behind 61% of the batters he faces.
--StS% is the % of swinging strikes.
--StF% is the % that are fouled off -- looks like he's throwing fewer 'tough' pitches
--StI% is the % that were put in play in fair territory. Batters are simply having an easier time getting his balls into play: 50% more often than in '05
--SoC% is the % of his Ks that are called. Usually you get a called strike when you fool someone.
--Cntc% is how often batters make contact when they swing. He's missing fewer bats with his pitches. When they swing, they're hitting it.
--3-0% is the percentage of his plate appearances that end up at a 3-0 count. Ouch. His lack of command really shows up here.
--0-2% is the same, but with 0-2 counts. He's not throwing enoguh strikes to get himself into a good position.

Sum it all up, and the stats paint a picture of a pitcher with command problems, who's falling behind batters and getting hit when he leaves fat strikes over the plate.

Does that profile look like John Patterson? It does to my eyes.

He can get by without velocity, especially early in the season. I worry less about that than I do his inability to control his curve or his slider. Until he's able to throw them for strikes, batters can sit on his slower fastball. But once (if?!) he gets those working, it won't matter if he's throwing 82 -- as anyone who saw Livan almost no-hit the Nats can attest.


  • Chris, I don't understand your complaint. You're complaining because numbers usually reflect what you see on the ballfield? Isn't that a good thing? If the numbers and your eyesight never agreed, which would you trust? And isn't the point of the numbers to identify the 10% to 20% of times eyesight might be misleading?

    IMO, WPA is effective precisely because it reflects the emotions of the game so well (as opposed to a boxscore, which reflects performance). And on a play by play basis, you will definitely find some surprises in virtually every game.

    By Blogger studes, at 4/10/2007 9:52 AM  

  • The graph/chart validation for everything is just a natural reaction to about 100 years of "This is what I see and I know what I'm talking about" analysis that did ok, but fostered a bunch of wrong theories. There are worse things than having to back-up what you say. It sounds like you just wanted to bring up JP again and were annoyed you had to drag out a chart lest noone believe you.

    By Blogger Harper, at 4/10/2007 9:55 AM  

  • As usual with my stream of consciousness rants, I didn't really make myself clear.

    The charts and graphs by themselves aren't bad things. And, yes, there are times when they do add things. But there are also many times when they don't do anything at all and are ascribed much more value than they provide, as if the stats trump observation, when they often don't.

    What fangraphs is doing is amazing. It's a site I do use on a daily basis, and the sheer volume of data they're presenting and making available is incredibly useful. This wasn't meant to demean what they're doing.

    As to your second graph, Dave, I'm not sure that that's true. Anyone watching most of these games carefully can be in tune with these fluctuations of chances. Maybe we can't quantify that a double play cost 10% of our chances, but we know it was a killer.

    Where the stats are valuable, and I alluded to this in the post, is in letting us 'see' things we didn't actually watch. I only caught the first few innings of the Giants/Padres game, but I know how the end played out because of the graph.

    It's because the numbers DO match observation so closely that it has value. If they differed, what value would it really have?

    By Blogger Chris Needham, at 4/10/2007 10:11 AM  

  • I agree that the purpose of the chart is to convey your feelings. On that, it's perfect. It reflects what you see, and there are NO SURPRISES. (Or virtually none... the few surprises would be mild ones, and a bit interesting, especially with stolen bases, etc.)

    But, the larger purpose is that now that we've been able to quantify those feelings, we can add them up.

    No more are we going to be able to forget about ARod's slam. It will no longer be a footnote. Rather, it will be a looming shadow, one that will always be present, because we were able to quantify our feelings.

    WPA said "I love you", and no one's going to take it back. It will be there from now until eternity.

    By Blogger Tangotiger, at 4/10/2007 11:21 AM  

  • Sure, to an extent.

    Let's say, though, that ARod comes up for the next 30 games in the exact same situation and strikes out like Casey.

    The stat and the feeling are both going to reflect each other again.

    Now sure, that's an extreme, but even to a lesser degree, if he gacks away in critical chances next year, both feelings and WPA are going to reflect that, even with what he did there.

    It's really just a way of explaining words with a chart. There is some value in that, but I don't think it's revolutionary or anything.

    By Blogger Chris Needham, at 4/10/2007 11:25 AM  

  • I'm not sure it needs to be revolutionary, and neither is say Leverage Index.

    Yes, it simply translates whatever adjective-du-jour is in play into a chart and numbers. Isn't that good? This would cut out 90% of the drivel we hear and read.

    ARod's WPA was +.78, while the LI was 10.9. That's it. Take the 1 million words discussed on this subject, and leave it to that one line. People want to be part of the story, and they don't need to be. The WPA/LI does that in two numbers.

    All I want to hear is what ARod himself has to say about how he felt and how he approached the PA (and I want to hear from the pitcher). I don't care what ESPN mouthpieces think.

    By Blogger Tangotiger, at 4/10/2007 11:32 AM  

  • You're right. Words can be overkill.

    But there's nothing compelling about those two numbers by themselves, especially since they really lack any sort of wide context. I know what they mean, but I don't feel them in a way that a .400 OBP or a .280 BA mean to me.

    Of course, much of that comes with exposure to the numbers.

    But the OBP and BA are also things that you can't "see" in an individual game. If there weren't any stats or names on jerseys, you wouldn't necessarily be able to tell who were the best hitters on those two stats.

    But in that same situation, you can probably figure out pretty easily (for the most part) who's leading in WPA.

    Like I said, the biggest use for it or any stat isn't in any individual game anyway, but in its ability to round up large chunks of data and performances over time. We can't be everywhere and see everything. But the stats can do a good job of being our proxy eyes.

    By Blogger Chris Needham, at 4/10/2007 11:37 AM  

  • Yes, its true, I am not much on stats. But, watching Ryan Zimmerman play at every home game, you can see he struggles at balls hit right at him. He is tentative, almost worried. Barry Larkin was specifically working with Ryan on that exact problem this spring. Barry put Zimmerman on his knees so he could only use his glove and upper body to stop liners and grounders head on. It was fascinating to watch this drill, as Larkin FIRED AWAY!! Of course Zimmerman already has three errors credited to him in the very first week. None on hits right at him. Thanks for mentioning my opinion.

    By Blogger Screech's Best Friend, at 4/10/2007 12:03 PM  

  • LI and WPA require exposure to feel the meaning, just as any other stat in the world does.

    By Blogger Tangotiger, at 4/10/2007 12:04 PM  

  • Are they going to get that exposure though? They're descriptive stats, telling us what has happened. And I'm not sure they're predictive at all.

    OBP is mildly predictive. If I know that Nick Johnson has a .400 OBP, there's a good chance he's going to battle hard in this game and has a good chance of getting on base.

    Now maybe in ARod's case last year, there was a bit of predictive ability in how poor he was doing, but does it carry over?

    Until it has some value beyond telling us what we're seeing, I'm not sure it's ever going to develop that sort of context.

    It's a fun stat. And it does tell us what makes up a win -- what the building blocks of individual games are. And it does tell us who has contributed the most to the actual bottom line over the course of a season. That's valuable to an extent, I guess.

    By Blogger Chris Needham, at 4/10/2007 12:08 PM  

  • Tom,

    I have a hard time understanding how "ARod's WPA was +.78, while the LI was 10.9." are anything other than "drivel". They're contextless numbers, and in no way does that make for evocative or interesting writing.

    I just don't understand what value they bring to the table.

    By Blogger Yuda, at 4/10/2007 1:16 PM  

  • They are contextless *to you* (right now anyway). They are full of context.

    The LI of 10.9 happens to be the highest LI in baseball. LI is leverage index, and an LI of 10.9 means that that particular PA can influence the game as much as 10.9 random PA in a game. (LI = 1.00 is average). That's 11 ARods with one swing of the bat.

    Seeing that a game starts at .500, we know one team, altogether, will add +.500 wins to win the game. ARod, presented his context, turned a .280 chance of winning into 1.000, or adds +.72 wins. That's enormous.

    As for making "interesting writing": it's certainly not less interesting than most of the drivel out there.

    By Blogger Tangotiger, at 4/10/2007 3:40 PM  

  • So, in other words, you needed two paragraphs to tell me that having one of baseball's best hitters at the plate in a key situation is a good thing?

    Stats have their value. But in this case, they're not bringing anything to the table and are instead making it sound less like a human pursuit and more like endless spreadsheets.

    By Blogger Yuda, at 4/10/2007 3:51 PM  

  • Got to say, in virtually every game I've personally charted with WPA, I've found something that surprised me. The impact of a particular play was more in doubt than I thought, or had a bigger impact. Emotions tend to take things to the extreme; WPA adds reason.

    As Tom says, these stats will start to make sense with exposure. That's why I think we should encourage people to use these graphs more instead of less. The good thing is that they have strong context (LI and WPA both use 1.0 as a comparison point, but in different ways).

    Sure, they aren't predictive, but so what? The score of a game isn't predictive. There's a wide use for stats that aren't predictive.

    Here's a question: do you like to read box scores? WPA graphs and logs are just a different type of boxscore. In fact, I'd love it if each boxscore was accompanied by a WPA graph. That would tell the story of a game.

    By Blogger studes, at 4/10/2007 3:55 PM  

  • One thing it really has going for it is that it's intuitive. It makes sense when you look at a graph, and the methodology can be explained relatively simply.

    The analogy to a box score is a good one, and I don't have any objection to either. My original point, to take it a step further, is that it's also analogous to a good game story.

    Sure, a gamer probably misses how important some individual plays are, but, for the most part, the minor variations don't really add up to a whole lot. But when someone grounds out in a tie game with the bases loaded, that's going in the gamer. And when ARod hits a walk-off slam in a game his team is trailing, that's probably in the headline. ;)

    I wasn't disparaging the stat. I was just saying that, for the most part, it's not adding anything new. It's just a different way of looking at and presenting information.

    By Blogger Chris Needham, at 4/10/2007 4:00 PM  

  • Hey, don't get me wrong. As a designer, I'm a visual person by nature. I'm all for good, useful illustrations accompanying boxscores and/or articles. And, in many cases, win-expectancy graphs do just that.

    But I hope I never read a gamer about a game with a walk-off grand slam that only says "ARod's WPA was +.78, while the LI was 10.9." That's not nearly as evocative of the situation as even something as simple as "walk-off, two-out grand slam".

    By Blogger Yuda, at 4/10/2007 4:09 PM  

  • John: I wasn't doing anything of the sort.

    I was trying to explain LI and WPA to you, and not trying to use two paragraphs to explain anything about ARod's play.

    WPA and LI are supposed to tell you about the human pursuit, not remove it. It trying to remove all the numbers and gunk away and let you focus on what you should: the actual game.

    While nothing can compare to actually watching that Cubs game, I'd rather have this accompany me, than the endless drivel I read about that game:

    By Blogger Tangotiger, at 4/10/2007 4:28 PM  

  • I was just saying that, for the most part, it's not adding anything new. It's just a different way of looking at and presenting information.

    I think this is the point we'll disagree on. I've spent time logging games with a WPA spreadsheet, and it's taught me to watch the game in an entirely new way. I have an entirely different feel for the flow of the game, and the impact of particular situations.

    Plus, I think there are a lot of baseball people (read: announcers) who could learn a lot about key situations by using WPA. How many times have you heard an announcer say the key play of a game was a sacrifice bunt in the sixth inning, or some such thing?

    You may not pick up on the differences between WPA and watching a game by looking at the game graph. Probably the best way to understand the difference is to log or follow a game play by play. If you do that, and you still don't get anything new from it, well, c'est la vie!

    By Blogger studes, at 4/10/2007 4:29 PM  

  • If the standard is "it's better than a crappy hard-headed announcer" then any and all stats are valid. ;)

    I understand what you're saying, and I haven't done enough to acknowledge the little things it does.

    The ARod homer is a stupid example for us to be chipping back and forth because anyone and everybody can tell that that was the key game and that it was a high leverage situation.

    Where WPA excels IS those little plays -- the value of that sixth-inning bunt, the value of that double play the team couldn't turn. It's the little things, all of which add up to the wins and losses and get buried in the boxscore that it's good at.

    If you've got a gamer, a boxscore and a WPA chart, you do have practically everything you need to know about the game.

    But if you only have one of them, you've still got a pretty good idea of what happened, even if not necessarily on a micro level.

    By Blogger Chris Needham, at 4/10/2007 5:10 PM  

  • Bill James once said something to the effect that "We look at stats because we can't watch every play of every game." In other words, stats are a shortcut to watching the plays. But they both represent the same reality, with varying degrees of bias and error.

    Having had some experience with doing both in 2005, I found that things like WPA are interesting at first, but you quickly become accustomed to the information they provide and work it into the way you watch a game, and then it becomes less interesting.

    What needs to be done now is develop some ability to analyze WPA graphs for "types" of games, and patterns among game flows.

    By Blogger DM, at 4/12/2007 10:02 AM  

Post a Comment

<< Home