Friday, August 8, 2014

Advanced Stats: The future of yesterday

In my life time, the solar system in which we live has gone from consisting of 9 planets, to 10 planets, then back to 9 planets, and then down to 8 planets.  The demotion of Pluto as a planet caused a bit of sentimental stir in the science community.  What led to its discovery, and then the discovery of a tenth planet, and finally the reclassification of both to dwarf planets came from a change in what science was trying to answer in the first place:  why did Neptune lose last night’s hockey game?  I mean, why was Neptune’s orbit wonky?  Looking at the data, it was hypothesized that another planet was affecting Neptune orbit of the sun. Then one day, after some legal crap, a fancy flip book, and some coincidental calculation, Pluto was discovered.  To consider Pluto a planet, you had to accept its counter-intuitive behavior as a celestial body and capitulate that there was really no better way to determine its mass.    But hey, back then, it was a planet!  Science said so.  And because there was no better way to show determine if it was a planet, we accepted it was a planet.  Because, science.

Today, the great debate in science is advanced stats in the NHL. Ok, maybe they aren’t the great debate of our time.  But advanced stats are certainly the topic de jour in an NHL speak-easy near you.  I remember the heady days of Slew Footing, man-in-the-crease violations, and the bane of Six-foot-six superstars everywhere: clutching and grabbing.  Those were topics that dominated discussion around the NHL back when superstars of the 70’s were assuming front office jobs and ruining hockey in the 90s.  Today, the topic that everybody (except players and coaches) is talking about is advanced stats.  Spoiler Alert:  I am a data hound.  In my day job I use metrics as a starting point to help solve issues surrounding profitability and on-time performance.  There isn’t a day that goes by where I don’t use metrics at work, which in turn pays the bills.  But, to me, advance stats are A STARTING POINT TO SOLVE A PROBLEM, they don't define the problem or provide the solution.  They serve as a starting point for discovery. Why didn’t we meet our margin goal yesterday?  Oh, this analysis over here says my margin in Houston sucks.  I’ll start looking at how Houston is running their operations.

Of course advanced stats are here to stay because they’ve been around forever.  Read that last sentence again.  Ok, read on.  Advanced stats as they exist today show us things that many coaches and scouts already know, but perhaps they couldn’t articulate them outside of the film room.  Just because coaches likely already know what the stats tell us today doesn't make them any less relevant. Corsi, Fenwick, PDO, and Zone start data are some of the more popular advanced stats in discussion.  I like advanced stats, I love advanced stats in hockey.  However, I am very cynical of their sacred cow status by those largely in the media and bloggin community.  Maybe it’s the nerd subculture that exists with the NHLs social media demographic, but I am beside myself at the nerd rage that ensues when the sanctity of advanced stats are questioned - much like when the ‘planetary status’ of Pluto was first questioned back in the 1970’s. 

Having said that let me also say that advanced stats must be mandatory curriculum for any person filling a roster spot on a professional hockey team.  Zone starts, PDO, quality of teammates, and quality of competition are good indicators of a player’s effectiveness is specific situations.  These things must be considered when vetting candidates thru a trade or free-agency. But any scout worth their salt has been able to communicate this to a head coach or front office from a game planning perspective for a long time.  Analyzing where a player starts in certain situations and where they are vulnerable is nothing new.  A good scout can tell you when a player is likely to turn a puck over, make a bad decision, or why a power play is struggling.  A good coach doesn't need many advanced stat to tell him which players are ineffective in key situations. 

However, it wasn’t until very recently that we were able have a statistical representation of those instances – advanced stats.  Now, we can throw the statistical representation of that data into a slideshow and blog about how awesome or lousy some players or teams are.  But the advanced stats summarize what many coaches already know and put it in a much neater presentation format than a scribbled scouting report.  You have a neat statistical representation of a player's efficiency in an easily digestible format.  If you’re looking to fill an open roster spot you’re already using advanced stats whether you realize it or not.  If you’re not directly or indirectly using advanced stats to fill your roster, they you’re probably doing color commentary, studio analysis, or talking about how awesome you were as the Blue Jacket’s first President and GM on TSN after getting canned. 

Hockey is not a complicated sport unless you make it that way.  Hockey at its core is simple.  In fact, a good coach doesn’t over complicate drills in practice, but looks to maximize the drills efficiency.  A well-coached Pee Wee travel team practice will look a lot like an NHL practice.  In fact, many good and effective youth hockey coaches already know 80-90% of the drills used in an NHL practice.  I’m saying all this to emphasize hockey is not a complex sport at its core.  I think many new fans and people who didn’t play competitively wrongly assume hockey is vastly complicated in its systems and schemes.  So when they see and “advanced stat” and understand it, they feel like they’ve cracked the code of hockey.  They fiercely defend that portion of the sport they can relate to.  Seriously people, hockey’s not that tough to understand. 

Advanced stats help articulate the ‘why’ in the outcome of a game.  The “advanced” part of used in the nomenclature is not that the calculations themselves are advanced, but that they are an advanced way of interpreting data – which is a good thing.  Ok, the dudes that came up with THoR use some ridiculously complex calculations.  More on that in a little bit.  However, as a coach in the NHL, advanced stats aren’t going to tell you anything you don’t already know.  In the minds of players, their effectiveness on the ice goes beyond what analysts are able to throw into a workbook and graph. 

Corsi, while I understand it, grinds my gears a little bit.  Everyone, including the Corsi fan boys, will admit that the stat is a compromise.  It is the best of what’s around as far as a possession stat.  I agree that while they tell a different story, that Corsi is a better indicator of a player’s contribution to their team’s success than ye olde +/-.  But you have to ignore the counter-intuitiveness of Corsi and that’s what irks me.  There has to be a better way.  If you swear by the sanctity of Corsi, don’t you dare moan when Wisniewski rips four shots in a row four feet wide.  And you Corsi whores better not be the ones yelling “SHOOT IT” when there are four shin guards and a fully dressed goalie filling a shooting lane on a power play.  Context my dear friends, context – it is impossible to garner context from stats.
Most advanced stats provide instant gratification and are often used to answer the “Why” of how a team won/lost in the form of statistical representation.  But the “why” is not “how.”  Stats will tell you why, a coach will tell you how.  Corsi will tell you that Atkinson firing a snapshot into Lucic’s shinguards is neutral at worst, an NHL coach will find out who Atkinson’s pee wee coach was and slap his mother about the head and face.  

There is unfounded resistance to advanced stats and equally there is Jonestown-sian devotion to them.  The vocal nerdy hockey media rips apart anyone who remotely questions the logic of advanced stats.  Critical thinking applied to the theories in advanced stats is often met with ridicule, some of which I’ve experienced first-hand.  On the flip side, there is a lot of push back from the people that actually do the hockey thing for a living.  I understand why.  Ask yourself this:  Would you take workout and dietary advice from the fat guy at work who smokes two packs a day and eats nothing but Speedway hot dogs and ding dongs all day?   Then think of it this way, You’ve spent 40 years playing, coaching, and loving hockey and you get ripped for not taking player utilization advice from a guy who can’t even run a mini-mite practice?  Can a beefcake bro in an Iron Man T-shirt and a Contra Spray Gun tattoo be considered anything other than fake-geek?  You see where I am going with this now?  

One of the things that define a hockey player is their conviction to a belief structure in specific situations.  IE, passive or aggressive pressuring of the points during a penalty kill?  “Five on a dice” even strength Dzone coverage or roving center?  Not shooting the puck on 2-on-1 when your line mate’s relative position to the puck dictates that you should shoot it.  Coaches call this ‘buying into the system.’  And those systems are just the empirical knowledge of advanced stats supported by years of anecdotal evidence and enforced the ability to demonstrate those concepts during practice while seeing the results in games.  The circle of life is renewed as player becomes coach and coach becomes management.  That is the culture of hockey.  Better coaches, players, and GMs are always willing to look at things a different way – if those different ways aren’t gimmicky.  A head coach and player are going to approach advanced stats differently.  A player or coach will say “advanced stats don’t tell me anything I don’t already know” and a GM will say, “This kind of information is very useful in helping me fill a roster spot from the players that are available.”  And to not to beat a dead horse, but when Tyutin misses a wide open net, nobody’s congratulating him on a positive Corsi event.

So, for the better part of two pages I’ve talked about how advanced stats as they pertain to situational usage don’t tell a coach/scout anything they don’t already know, and that Corsi and Pluto (the dwarf planet, not the dog) are related.  I’m a data guy, a couple of my DKM/DBJ compatriots are data guys, and we’re going to do some stuff with data.  Of all the advanced stat stuff out there, I like Total Hockey Rating or THoR.  It won the MIT Sloan Sports Analytics award in 2013, long before most people knew Corsi was a name and not an acronym.  THoR gets a bad rap from awkward sci-fi movie watching nerds because the data inputs it uses are suspect – it solely relies on the NHL’s RTSS.  I think when real nerds at MIT, who were Valedictorians and post-secondary standouts, out-smart the socially awkward sci-fi watching introverts, the lesser nerds get ragey which is why they poo-poo THoR. 

Anyways, THoR uses some real heavy-duty genius-level mathematics, in addition to commonly accepted advanced stats to determine how many wins a player is responsible for generating.  But what I like about THoR is one of the calculations they use - they weigh the cause and effect of every action that happens on the ice.  So, when a check, or a shot, or a pass happens they calculate what happens in the next five seconds of that event.  I like that in the sense that it indirectly looks at a Corsi events and considers what happens in the next few seconds after that event.  You can be jealous of THoR and rip on it all you want, but it considers if ripping a shot 4 feet wide is a good thing or not.  It also helps put statistical merit to things like the value of out-hitting another team beyond the intangible physical toll.  If there is a set of stats a hockey coach should try to look at, it’s some of the theories and calculations that are considered in THoR.

I understand Corsi as it relates solely to possession, I do.  I can’t tell you how many blog pieces I’ve read that say “ignore the conflicting/counter-intuitiveness of the data that goes into Corsi.”  I get it.  It is better than anything else out there - Just like the mini-disc was.  But at its fundamental core, how can we say that things which often generate turnovers are positive benchmarks for possession?  And goalie came up with this stat for crying out loud?  Goalies take too many pucks to the head and are quirky.  So this weekend I’m going to watch some CBJ games and do three things:

1.       Augmenting the benchmarks of THoR, I will establish a percentage of how many blocked shots and missed shots turn into turnovers with 3 seconds of the event – ThoR uses 5 seconds which I think is a little too long.  Hell, they go to MIT and I dropped out of college twice – maybe I should use their standard of 5 seconds.    It will either confirm the grain of salt I eat with my Corsi, or it will make me shut up.  To me through 30 years of anecdotal evidence, in an even strength situation, a missed shot or a blocked shot from the point often leads to a turnover.  We’ll see.  And turnover in this instance is defined by a blocked/missed shot going to an opposing player either directly or through the course of a commonly defined turnover.


2.        Corsi does a great job showing comparative data on possession.  It doesn’t show me diddly-poo on the efficiency of those events.  Without giving away my formula for fear it will be stolen before my finding are published, I’m going to borrow from the existing Corsi structure and augment it to show offensive efficiency.  But what I will look to answer the question: Are the shots attempts high quality changes or mostly garbage? 


3.       One of the first mainstream possession studies done was the USA Hockey puck possession study conducted during the 2002 Winter Olympics.  12 years ago they were able to establish an average baseline of ‘for every X minutes the player was on the ice they had Y seconds of puck possession.’  How was this possible?   That kind of technology doesn’t exist today, how could they have had it back then?  EA Sports are the only ones with this kind of technology to calculate “ATTACK TIME.” Well, I know how they did it and I’m going to apply it to the CBJ game footage I watch.  I’m going to track possession time in the offensive zone and come up with a possession efficiency stat derived from Corsi.  Hey Edmonton Oilers, 2002 called and they want their advanced stats back. 


I will publish my finding along within the next few weeks and share all the variable of my experiment – even if the results show I should get “CORSI” tattooed on my face.  I will disclose what game(s) I watched, any errors in data collection, and how many beers we consumed conducting the experiment.  Heck, I might even call an NHL coach or two to pick their brain about advanced stats!  I am excited to test my theories regardless of the outcome.  More importantly, I excited to see what things we’ll discover that we hadn’t anticipated.  And while I may not discover the next Pluto, there’s a very good chance I'll end up staring at Uranus.