Descriptive vs Predictive eGF vs Corsi for Forwards

When I created eGF the goal was for it to be a descriptive stat – something that helps to explain what’s going on (in terms of how goals are scored) better than currently available stats.  In order to prove its value over something like Corsi what I need to do is prove that a) it is a better descriptive stat than Corsi (i.e. correlates better with current GF) and b) it is at least as useful as a predictive stat.

Before I get into any charts or data – I want to explain my rationale on predictive stats.  I think you should always put yourself in the role of a decision maker, whether it’s a fantasy sports GM or an actual GM, the role of a predictive stat is to try and determine the kind of year a player will have.  In other words, will a specific player help you win in the future.  Because I believe there is a strong temporal element to hockey (meaning players or teams can go hot/cold) and I feel that random sampling to determine the predictive nature of a stat (i.e. taking 41 random games of data to randomly predict the other 41) removes that element, I always gauge predictive stats on a season to season basis.  Basically, if a guy has good numbers in a stat in one year how likely is it that leads to success (as measured by goals for) in the next year?

This post will deal entirely with forward data; I will do a follow-up with the defense data at some point in the future.  Also, so as not to cloud the picture I am using only data (for descriptive stats) from 2014/2015 and from players that also played in 2013/2014.  This allows us to set a baseline for the predictive part.  The reason why I limit the data to 2014/2015 is because 2010/2011 to 2013/2014 was all in my training data set for eGF, and if you build a model and then compare it to the training data you better hope that it does well.  If it’s a regression, then it will almost automatically do well (provided your independent variables were chosen well).  So let’s just cut straight to the chase and see if eGF is a better descriptive stat for forwards than CF.

20160118 Forward GF20 CurrentAs you can see eGF20 outperformed CF20 by a pretty wide margin (statistically significant to 99.99%) and was in turn outperformed by PTS20 by a similar margin (statistically significant to 100%).  A couple more explanation points here; GF20 is a team level stat – meaning it is the observed amount of goals that a team scored while that player was on the ice.  eGF20 is also a team stat in that it is the expected goals for the team while that player was on the ice.  PTS20 is an individual stat, and represents the points that individual received (per 20 minutes) on the GF that his team got.  Now, it is obviously problematic to use points because points can only exist if goals are scored – by default this means the correlation with points will be reasonably high.  The other problem with points is that it has no value as a descriptive stat for defensive measures.

20160118 Forward GA20 CurrentThis chart illustrates that point perfectly.  PTS20 has almost no correlation with GA20, and as a descriptive measure is useless.  eGA20, on the other hand, has a reasonable correlation (it’s low because I believe forwards have less control over GA than defense does) that is significantly higher than that of CA20 (statistically significant to 99.8%).  So eGA is a better descriptive measure than CA and eGF is a better descriptive measure than CF for forwards in this data set.

For descriptive purposes there is no reason to use Corsi when you could use eGF for forwards.  Now the question is – is there a reason to use it for predictive purposes?  I’m going to channel @IneffectiveMath here and let you know the short answer is no, because there are no particularly great ways to predict next season’s performance (as measured by GF20/GA20/GF%).

To establish a baseline let’s use the same data set for the descriptive stats to show their predictive abilities.  That means using 2013/2014 statistics to predict 2014/2015 performance.  Here is the chart:

20160118 Forward GF20 PreviousWe cannot say that there is a significant difference (>95% confidence) between any of the measures in predicting future GF20.  That being said, previous GF20 (pGF20) was the best predictor in that data set followed by eGF20, then by CF20.   There appears to be some ability to guess a forward’s ability to generate goals for based on his previous year’s stats (no matter which one you use).

20160118 Forward GF20 Previous All SeasonsWhen you look at the above graph, for all forward data from the years 20112012 to 20142015, you can see two things: 1) Excel is a terrible program that wouldn’t let me change the style of this chart for some reason and 2) the relationship holds true with more data.  None of the correlations are different from each other to a statistically significant degree (95% confidence), but they are all significantly different than zero and show some ability to predict GF20 based entirely on the previous seasons’ numbers.  Once again pGF20 outperformed eGF20 which outperformed CF20 (for this data set), but again not to a statistically significant level.

This is more, or less, in line with what I’d expect.  A forward has a significant degree of control over the offensive zone – but he is just one of 10 players on the ice.  Offensive pressure might be primarily controlled by 5 (or 6) players (being the 3 attacking forwards, 2 defense and possibly a center) so a player’s offensive skill (or lack thereof) should be somewhat consistent season to season – though masked by other factors (linemates, opposition, system, etc).  For defensive play I would expect that the average forward has much less to do with defensive success and so I would expect to see less correlation with future GA.  Which is exactly what you see in the next chart:

20160118 Forward GA20 Previous

These correlation differences are not statistically different from each other (to 95% confidence), but eGA20 outperformed CF20 which outperformed GA20.  This chart tells me that systems, defense, and goaltending are likely bigger parts of predicting GA20 than forwards are.  Things don’t change much if you add in more data (2011/2012 to 2014/2015) all the R^2s stay below 0.1 and none of the difference are statistically significant (to 95% confidence).  The reason I included GA20 in the prediction stats is the predilection people have to using CF% or GF% as a measure to evaluate a player.  As we can see, for forwards what’s most in their control (as it relates to goals) is goals for – they have much less control over goals against.

20160118 Forward GF%That’s why charts like the one above don’t necessarily make a lot of sense to me.  Well they make sense, it just doesn’t make sense to look at them and think you’re looking at something useful.  Predicting a GF% for a player relies on so many things going right.  On the offensive side he needs to maintain his SH%, maintain the quality of his chances, and likely maintain a system and his regular linemates.  On the defensive side he has less control (as mentioned previously) but has to rely on his goalie saving the same percentage of chances and those quality of chances against remaining the same.  Through system changes, player changes, and injuries it is very unlikely that the prediction endeavour (based on predicting his GF%) will be successful.

All of that being said, CF% had a higher correlation than eGF% which had a higher correlation than GF% of using previous year information to predict current year information.  None of the differences were even a little bit significant from a statistical perspective (around 50% p-value).

eGF has shown to be a superior descriptive statistic to Corsi in measuring both goals for and against, for forwards, while simultaneously being statistically similar as a predictive measure for goals for, against, and GF%.  In my (very biased) opinion, this removes the need to look at Corsi metrics to quantify the play of individuals as eGF is equal or superior in every (statistical) regard.

 

eGF Finally Fully Explained

Around March of 2015 I launched my webpage (www.xtrahockeystats.com) which was centered around the concept of eGF (expected goals for).  In hindsight that was a stupid name, it should have been just eG (expected goals) but I’ve been using eGF for almost a year and it’s mentally drilled into my brain hole.  The good news about using eGF as my acronym is it allows easy distinction from some of the other expected goal models out there that use the acronym ‘xG’.  This article will be my attempt to explain the key differences between eGF and other shot location based models.

Since I have never fully explained the process by which eGF was created (and how it’s calculated) publicly and I am so familiar with it I might gloss over some detail that others might find to be important – please feel free to ask any questions that you might have.

Before I go into detail about exactly how eGF is calculated I think it’s important to mention that I created my own scraper for my website.  The data, therefore, will not exactly match other sites like hockeyanalysis.com or waronice.com but it should be very, very close.

The reason I created eGF is because the concept that all Corsi events are equal drives me crazy.  They are so obviously not equal, and players play the game (i.e. generate and receive Corsi events) in such vastly different manners that it’s insane (to me) to think that the difference in events ‘even out in the long run’.  So my first inclination was to include some kind of shot location metric in my eGF model to compensate for that.

I’m not going to post any charts showing that shot location matters.  If you’re familiar with the War-On-Ice HD SC metric or the concept of HD SV% vs MD SV% then you know that certain locations carry a higher shooting percentage – which is proof enough for me that shot location matters a great deal.  What I actually did was break the zone into 27 separate areas and then looked at SH% from each of those areas further broken down into shot type.  Each area/type was then given a score based on the conversion rate for that area.

Just that step, in my mind, made my model superior to Corsi in predicting current goals for.  That is to say, predicting how many goals these Corsi events should have led to if the shooter were average and the goalie was average.  I don’t have the data from just this step anymore – but it still didn’t feel right.  Hockey is a sport where there are so many moving parts that it just feels wrong to look at a particular instant in time (in this case a shot event) and pretend that we have really any idea about what’s going on in the game.  In fact, with the data provided in the NHL PbP file we have almost no knowledge about what’s going on in the game in any particular event.

Starting around 2010/2011 the NHL got much better about including X,Y (location) data with events.  They did it before, but it was more sporadic and generally less accurate (again this is an opinion that I am not going to substantiate).  But even with the better X,Y data – what information do we really have about any particular event from the PbP file? Pretty much just this:

  • Who did it (Acting Player)
  • Who he did it to (Receiving Player)
  • Event type (i.e. Face Off, Hit, Giveaway, Shot, etc)
  • Players on the ice (for both teams)
  • Possibly X,Y data
  • Various minor info on the event (shot type for example)
  • Time of the event

Really anyone who has watched a reasonable amount of hockey will tell you that is not enough information to make a fully educated decision about whether or not that play will lead to the desired result: a goal.  Basic logic will tell you that because we see higher shooting percentages (i.e. fewer chances required to score the same number of goals) in shootouts than in powerplays, which in turn have higher percentages than even strength play, that the position of defenders relative to attackers – or defender readiness versus attacker readiness – plays a key role in determining chance conversion.  This is a hypothesis that I will support later on, but you can clearly see from the info in the PbP for a single event that there is absolutely no way to create a proxy for defender readiness (until we get player tracking).  There is, however, a way to look at proxies for defender readiness if we chain the events together and view them as a play instead of a series of distinct and separate events.

Here is where the second key difference between my stats and most other publicly available stats come in.  All of the events in the database are (theoretically) identical, but my site views and publishes them as a series of plays.  You can link a series of events together as being “for” a team based on who had to control the puck in order for the event to occur.  For example, in order to generate a shot for then you must have the puck.  Similarly, to give the puck away you must have had it to start with.  So by following simple rules you can link plays together by assuming who has the puck and then analyze the events in the play to create a proxy for defender readiness.

So, just as Corsi was originally used as a proxy for possession – because the league doesn’t publish actual possession stats (i.e. time the puck is on a team’s stick) – eGF had to use proxies to determine team readiness.  These proxies are relatively simple to understand and are outlined below:

Proxy Logic
Length of Play The longer the attackers have the puck the longer defenders have to get into position.
Rebounds The somewhat random nature of rebounds means that both teams need to reposition in order to adjust for the next opportunity.  If that opportunity comes it is more likely that the defenders aren’t ready and therefore more likely to lead to a goal.
Rush If the play as a rush (i.e. you know it started in a particular zone and it changed to the attacking zone quickly) it is less likely that the defenders had a chance to get into position before the opportunity.
Giveaway/Takeaway Generally, teams don’t expect to have the puck cleanly taken from them and defenders will not be in an ideal position to react to these changes.
Face-Offs Contrary to the giveaway/takeaway, teams spend a great deal of time practicing face-off plays and know exactly where to go in the event of a win/loss.  It is therefore likely that team readiness is very high after a face-off.

 

Now all we need to do is prove that each of those proxies do in fact have some type of relationship with the conversion rate for goals and then we’ll know we have something.  Out of laziness I am just going to use single seasons for data illustration purposes.  You’ll have to trust me that each season tells the exact same story – or don’t trust me – it’s your choice really.

As an aside the process of writing this article has been very helpful to me – I immediately realized a mistake I made in the current iteration of eGF (v3) and have ideas on how to improve it for the future (v4).

So first let’s tackle length of play.  The mistake that I made is not in holding more variables constant in examining this (but when I did for v4 – which is not yet released – the relationship held true).  My theory is, the longer a play the harder it is to score.  This kind of flies in the face of conventional wisdom – which says that possession leads to goals.  While that’s true, when you bury your opponent in their end it becomes difficult to score:

20160108 Length  of Play

Basically, other than small sample weirdness, long plays don’t convert to goals anywhere near as often as quick ones do.  When you consider that longer plays must have more shots in them, this is a somewhat interesting relationship.  The next group we will tackle is the “play starting” group.  A play must start with either a face-off, a shot attempt against, a giveaway/takeaway, or with no data (because the PbP file is not perfect).  The below table illustrates the conversion rates for each group for the 2013/2014 season as compared to the 2011/2012 season (both chosen because it was easy for me to run those queries).

Play Start

The reason I’m not going to show every season is laziness.  The data is amazingly consistent season to season and illustrates that a) I wish we had better data (since no data is by far the most common play start) and b) how the play starts has a reasonable impact on the chance for that play to convert to a goal.  I also won’t bother going into rebound or rush work – since others have already proven that those are valuable contributors to determining conversion success.

Below, however, is a table that should give some idea (and backup to the idea that play length doesn’t improve conversion chance) about the importance of shot quantity in a play:

SAF

As you can see – the sample size gets relatively small as you get past 3 shots, but adding additional shots into the play does not necessarily improve the chance to score by much.  Which, if you think about for a minute means each additional shot lowers SH% and raises SV%.  The very interesting thing about this is in goaltender analysis – a Randy Carlyle coached team (for example) would have a very high expected SV% as they are more likely to have multi shot plays against.

                Again, in the interests of me not re-hashing work others have done – shot location matters a lot. Everyone knows it and everyone believes it – so I’m not going to go over it.  What I have shown though is that there are other factors which may also impact the chance that a play converts to a goal.  So, if you take the basic idea that WOI (and others) have of breaking the offensive zone down into areas – you can grade each shot location from 0 to 3 based on the odds that it converts to a goal (I have also done additional work on shot type by shot location which seems to make some difference) and then add in those other factors to get the overall quality of a single chance (or play).

All we need to do is design an algorithm that classifies chances appropriately based on the known quantities that we have (play start, length, shot locations, rush, and rebounds).  In effect, that’s all that eGF is, putting those factors into a black box and putting each chance into a bucket based on the (expected) odds that it will convert to a goal.  There are 6 buckets, 0 to 5, each with a different chance to convert to a goal.

For example, if you have a play that started with a takeaway, was a rush, contained a rebound, and at least one high quality shot – then that was likely a very high quality chance – probably a 5.  Conversely if you had a very long play that contained 4 or 5 shots from poor locations that was likely a very poor quality chance maybe a 1 or a 0.                 

Before I get to the actual data I have to add a very important caveat.  This assumes an average shooter shooting at an average goalie.  I think it makes no sense to include shooter (or goalie) quality in the calculation for eGF.  In an expected goals model you shouldn’t have the target of exactly matching GF – we already have a stat for that – GF!  What you are trying to do is find players who are putting themselves (or teammates) in good scoring positions while limiting opponent chances.  If you include shooter quality you immediately will find that the players with high historical SH% will be at or near the top of your eGF model all the time, while players who face them will have a higher eGA (all else equal).  To me that doesn’t help you learn about who is playing well or not – it’s basically like putting in a reverse QualComp measure – the harder your comp the worse your eGF% (all else equal)!

I’m just going to throw these charts at you and then explain them in more detail.  The first chart shows the total number of chances by chance quality in each NHL season since I feel there was reliable data (2010/2011), excluding 2015/2016 (mostly because I have these charts ready to go).

chance by quality april 3

The first thing you can see is that the absolute number of chances per season are relatively constant (by chance type, by season) and they tend to decrease in quantity as they progress.  The lock-out shortened season obviously has shorter bars since fewer games were played.  So, we’re doing well in showing that the vast majority of plays are just garbage where nothing much happens (Q0 and Q1).  The next chart is the absolute number of goals by chance quality by season.

goal by quality april 3

As you can see, despite the huge number of total chances in the Q0 and Q1 buckets, very few result in actual goals and again the absolute number per bucket is relatively constant in each season.  A massive number of the actual goals (>50% if memory serves) are the result of extremely good chances (Q4/Q5), despite the low total number of those types of chances.  The next chart outlines the conversion rates by quality by season.

goal conversion by quality april 3

Again the conversion rates are very steady on a season to season basis.  This is key in getting the model to work as you need some training data (in this case 2010/2011 to 2013/2014) to get it to work and then so long as the conversion rates are constant (or semi constant) going forward the model will continue to work.  It may need tweaks every once in a while – but it doesn’t rely on the season being finished and then going back over the data to correctly function.

So to recap to this point – basically we view each play as a whole rather than discrete events.  We then examine as many factors as we can to determine the overall quality of the chance and place that chance into a quality bucket (0 to 5).  We then have a very good idea of the conversion rate for each chance so we can count the number of chances each team (or player) had and multiply by the appropriate conversion rate to reach eGF.

Finally, to illustrate that it works better than Corsi or Shots (as applicable) I am going to show the comparative data for forwards and defensemen in the 2014/2015 season (which is outside my training set).  First the forwards, then the defense:

Forward GF

Defense GA

In the forwards’ case I used shots to explain GF instead of Corsi because it had a much higher R2 (0.7432 versus 0.68 something for Corsi).  For defensemen I used Corsi instead of shots because it had a slightly higher R2 (0.6855 versus 0.6844).

I’m sure I should have included all kinds of other comparisons, ultimately though, all of my data is available on my website and you can make any cross comparisons you want (i.e. team level – combined seasons, etc).  If you feel that I should show some other comparisons I’m also happy to do so – just send me a message on Twitter: @NickAbe.

 

 

 

Player Ratings and Future Performance

Around the time I was leaving university I joined a fantasy hockey league with several of my friends.  The concept of the league was simple – there was a ‘simulator’ (what it actually simulated we don’t know) that would, in theory, compare your roster to your opponents and spit out a boxscore.  Each person would be the GM of their respective team and compete for the Fantasy Hockey League Cup.  Each player in the league was based on a real NHL skater with distinct ratings in the different areas the sim viewed as being valuable to determining the outcome of the game.  The league ran for years and in order to prevent it from deviating too far from the NHL in terms of player abilities we had to ‘re-rate’ the players.  As you can imagine, this is where the arguments started.

In an effort to reduce personal biases in determining the ratings we would use, I developed a statistical method to determine a player’s individual abilities (based on their NHL stats) and applied it to the ratings we used in the league.  While some of the ratings were impossible to derive reliably from stats (available at the time), most were relatively straightforward like shooting (which was basically goal scoring) or passing (which was basically assists), etc.  While this was all for fun and I stopped doing it a few years ago, the concept seemed very applicable when I got more involved in hockey analytics about a year ago.  So, using the new data we have available, I created a player ratings system (not dissimilar to Domenic Galamini’s HERO Charts) and put it on my website (xtrahockeystats.com) about a year ago. 

What I never did is provide an in-depth explanation of how they work or reasonable proof that they do work, which is the purpose of this article.  In the table below is a list of all the ratings and a brief description outlining how they are calculated.  All ratings have a mean of 70 and a standard deviation of 5, which is a throwback to the ratings I generated for the fantasy league.  Obviously I could have picked a mean of 0 and standard deviation of 1 (or any combination of numbers) but I feel like this is pretty easy.  Finally, a lot of the stats are derived from eGF/eGA (expected goals for/against) a measure that I created to estimate the quality of chances (and therefore goals) a player is expected to get/give up.  I will, hopefully, go into greater detail explaining eGF in a different post – but for the purposes of this article just assume it works.

Rating Name Rating Description
Goal Scoring – GS

 

Really easy – a player’s ability to score goals.  Obviously this is based on a player’s actual performance, so scoring more goals gets you a higher rating.  This is based off of a player’s 5v5 goals per 20 minutes of ice time and his IPP (individual points percentage).
Passing – PA

 

Also straightforward, this is a player’s ability to generate offence through passing.  In this case getting more assists gets you a higher rating, based off of total assists per 20 minutes of ice time and IPP.
Offensive Possession – OFF PS

 

This is more complicated, but basically it boils down to a player’s ability to drive scoring chances for his team.  Based off a player’s ability to generate eGF relative to line mates and relative to the league.
Offensive Awareness – OFF AW

 

This is the most complicated offensive stat, but certain players are able to convert their scoring chances at a higher rate than others or help their linemates to.  David Clarkson is a good example of someone who has a low offensive awareness, even though he might be generating chances fewer of those chances go in.  Sidney Crosby is a good example of someone with great awareness who needs fewer chances to score the same number of goals as an average player.  Based off of a player’s ability to generate GF higher (or lower) than his eGF relative to line mates and relative to the league.
Offensive Overall – OFF OV

 

This is just a weighted average of all the offensive stats, which are then re-centred and redistributed with a mean of 70 and standard deviation of 5.  Meaning, you can’t just sum the stats to determine this rating (i.e. someone with all 72s could end up with 74 OFF OV because his consistency puts him that far away from the league average).
Defensive Possession – DEF PS

 

Similar to OFF PS, but instead a player’s ability to reduce scoring chances against for his team.  Based off a player’s ability to reduce eGA relative to line mates and relative to the league.
Defensive Awareness – DEF AW

 

Again this awareness stat can be a bit strange, but some players seem to be consistently better at preventing goals given the same number of scoring chances against – even compared to their teammates.  Based off of a player’s ability to generate GA higher (or lower) than his eGA relative to line mates and relative to the league.
Defensive Overall – DEF OV

 

Similar to OFF OV, but with the defensive stats.  Weighted average of defensive stats which is then re-centered and re-distributed around 70 with a standard deviation of 5.
Durability – DU

 

This is really just a measure of how much weighted ice time the player had over the previous three seasons.  A first line player with no injuries will have a DU close to 80, but an injury prone player, like Joffrey Lupul, can still have a DU near 70 because he gets so much ice time when healthy.  One key thing about this rating is you can quickly look at it to determine the reliability of the other ratings. If a player has a low DU (below 70), then the sample size for his other ratings is small – so he may deviate more from those ratings in the future.
Overall – OV

 

A weighted combination of durability, offensive overall, and defensive overall.  Because of the nature of the game, salaries, and what fans value, offense is weighted the heaviest of the three ratings.  This is not meant to be a catch-all stat.  It is just my personal approximation of a player’s overall, two-way, ability relative to the rest of the league.  Basically an Ovechkin will get a lower Overall rating because his defensive metrics are terrible as compared to a player who has above average defensive and offensive ability.

 

So now you have a general understanding of each of the rating categories and how they are calculated.  In terms of the actual nuts and bolts of how they are calculated, there is only one key to keep in mind: all of the ratings are generated before they get recorded.  That is to say, they are only every generated with information that you would have available at that time.  For example, if you were looking at a particular line match-up that occurred in a game in January 2014 – the ratings shown for that line match-up would be based off of stats generated prior to that game.  The ratings system never has information it shouldn’t have and as such any comparative data we create using the ratings should be useful for future prediction.

In effect, if a player has high (or low) ratings that is based entirely off of a player’s known history that we can make decisions based off of.  So if players with higher ratings perform better than those with lower ratings – then we know the ratings are a useful predictor of future performance.  In the case of historical seasons the ratings are updated every fifth of a season based on the new data generated.

Before we get to the data I wanted to explain why I went through the effort of creating these ratings, which can be summed up in one word: context.  One of the problems, as I see it, with hockey analytics is a stat might get thrown out there (say CF%) and be used as evidence of how good (or bad) a player is.  But on its own the stat doesn’t provide very much value.  You’d need to know his CF% relative to teammates and then relative to the league before making any determination on a player’s “worth”.  These ratings allow you to do that by looking at just one stat; want to know a player’s ability to generate chances then look at OFF PS.  Want to know if he is able to do little things to help keep the puck out of the net then look at DEF AW.

For anyone familiar with my website (www.xtrahockeystats.com)  you know that I have ratings available for every season from 2010/2011 onwards, the reason I start at 2010/2011 is because I need a sufficient amount of historical data (in this case 2007/2008 thru to 2009/2010) in order to calculate semi-reliable ratings.  Also all data used is always for 5v5 play only.  So, to test the effectiveness of the ratings I looked at the combined 5v5 data for 2010/2011 through to 2014/2015 and compared how players played against different levels of opponents.

To determine opponent levels I just looked at Offensive Overall and Defensive Overall ratings for opponents and weighted them, first equally among forwards and defense and then 70/30 for forwards (compared to defense) for Offensive Rating and 30/70 for defense (compared to forwards) for Defensive Rating.  Simply put, in trying to figure out how good your opponents are offensively I assumed most of that comes from the forwards so I used 70% of their average Offensive Overall rating and 30% of the defenseman’s Offensive Overall rating to determine an opponent Offensive Overall rating (and vice versa for opponent Defensive Overall rating).

Each and every 5v5 play was then examined to determine how specific players (and the league) played against varying level of opponents.  Unfortunately, the data for specific players can vary wildly because the sample size is relatively small.  However, when looking at data for the league a very simple (and obvious) trend appears: better players get better results.  Or, when you are playing against better opponents you get worse results (on average).

Below is the chart for the league versus opponents of various offensive skill and how they performed.  You can see that as the opponents increase in skill, the eGF% and the GF% decrease almost in a direct linear fashion – effectively the higher the rating the more difficult the opponent and the more
you get scored on.

Offensive Ratings

Note that this only controls for your opponents’ offensive abilities, and just because they are poor offensively doesn’t mean that you shouldn’t keep them around.  In the next chart we look at defensive ability and its impact on GF% and eGF%.

Defensive Ratings

You can see that the results are remarkably similar to facing different offensive opponents.  In the future I may include a 3 dimensional view of this chart showing the effect of playing opponents with both variables shown (i.e. their offensive and defensive strength).  For now though, it is clear that employing players who have good Offensive or Defensive overall ratings will improve your team’s results.

I also have goalie ratings and the relationship between ratings and success still exists, but it is much weaker.  More work needs to be done to determine a better way to predict goaltender success.  But the chart below does validate the method I used to collect and compare the data – as you can see that there is no appreciable relationship between eGF% and goalie rating.  That’s exactly what you’d expect since the goalie can only stop pucks, they don’t tend to have a real impact on the number of chances generated for and against them.

Goalie RatingsIn conclusion, historical player ratings have a very high and direct correlation with future results.  It doesn’t mean that a particular player can’t be highly rated and then turn in a poor season, but in general selecting players with higher ratings will lead to more success for your team than selecting players with lower ratings.  I also feel that I must re-iterate that all of the data used to generate the ratings is historical, there is no inclusion of any kind of forward data.  This means ratings are a very good predictor of future results.

They also have proven somewhat successful when you look at a team’s overall rating to determine their odds of success or failure (i.e. playoffs) in a given season – but that’s the subject for a different article.  If you’ve made it this far thanks for reading and please feel free to hit me with any questions or comments you may have.  I appreciate any feedback or criticism.

 

How to bet with the site

The title is a misnomer, you can’t be with the site – you will need to find your own site to place bets – what you can do is bet with the site, meaning place your bets using the two distinct prediction models that xtrahockeystats.com uses.  This article will explain how I bet, you can follow it if you like.

Obviously any betting is done at your own risk.  I have backtested the data and the percentages are accurate, but that may not hold true going forward.  So just do it for fun – don’t use it as a money making enterprise.

Step 1:

Go to http://www.xtrahockeystats.com/today.php  to see a list of today’s NHL games.

Step 2:

Go to http://www2.dailyfaceoff.com/starting-goalies/ to see a list of the starting goalies for today’s games.  Usually they are all confirmed a few hours before game time, so it is best to wait until then before proceeding to the next step.

Step 3:

Go back to http://www.xtrahockeystats.com/today.php and select the correct starting goalies.  In some cases the goalie won’t be there (emergency call-ups for example or the day of a trade), which just means that goalie has no rating in the system.  In those cases just select a goalie from the drop down that most closely matches his skill (I’d pick someone around 70 for an unknown goalie, or go to http://xtrahockeystats.com/ratings.php to find an existing player’s rating to match it). Make sure to hit update at the top of the page once you have selected the correct starting goalies.  

Step 4:

There are now two key numbers shown beside each team the “60m model” and “60m rating” win %.  These correspond to the two different prediction methods for the game and that particular team’s chance to win the game in 60 minutes  based on the method in question.  What I do with the data is enter it into an Excel spreadsheet.  I believe each model is equally valid – so I use an equal weighting.  If you enter the data into cells A1 (model) and B1 (rating) (for the away team) and C1 and D1 (for the home team) then you can use the following formula in E1 (away) and F1 (home) to determine equal weighted winning percentage.

Because 60m rating only shows if there are enough games to compare to, it will frequently say 0%.  This formula ignores the cell in that case.

=IF(b1>0,(a1+b1)/2,a1)

Obviously just change B1 to D1 and A1 to C1 when you’re doing the formula for the home team.

The formula for G1 should then be the odds of a tie (after 60m) which is just:

=1-f1-e1

Step 5:

Go to your favourite betting site and get the moneyline odds for each game.  Moneyline odds are a bit weird, so I always convert them to “breakeven” odds.  This just means the percentage chance a team must have to win the game in order to breakeven (assuming you bet a lot) in the long run.  Assuming your moneyline is expressed properly (+130 or -130 for example) in cell H1 the following formula will convert it to breakeven odds:

=IF(h1>0,100/(h1+100),ABS(h1)/(ABS(h1)+100))

Basically it’s wager over payout to determine the odds, but in the case of a negative moneyline you just need to make sure you’re using absolute numbers and that you don’t get confused as to what it means (-130 means you have to bet 130 to win 230, while +130 means you only bet 100 to win 230).

Step 6:

Choose which team to bet on.  Generally I look at total game odds (i.e. including OT and Shoot Out) because it is a binary outcome and because the odds are the tightest (i.e. minimal house edge).  This means that you have to decide, in the event of overtime or shootout, which team has a better chance to win.  I just assume the game is 50/50 at that point – but you can do whatever you want.

Again assuming the same cells as before and J1 is our away breakeven moneyline odds:

=IF(G1/2+E1>J1,”BET”,”NO BET”)

Basically this cell will tell you if the breakeven moneyline odds are less than a team’s chance to win (at 50/50 in ties) then you should bet on the game.  Otherwise you shouldn’t.

Step 7:

Bet on your favourite betting site as you see fit.  There will be lots of games where it says “NO BET” on both teams, this is because there is a house edge in all sports betting (i.e. they might offer -110 on away to win and -110 on home to win – which won’t add up to 100%).  In these cases, because I like to bet, I just bet on whoever my model says has a higher chance to win – although in theory there is no advantage to doing this in the long run.  

Final Notes:

I always bet each game equally (in dollar terms).  I know that some games I should bet more because theoretically my model is telling me the odds are better for certain games over others – but at the end of the day most games in the modern NHL are very close to a coin flip.  If you happen to lose your heavier betted coin flips it sucks – and while it should  work itself out over the long run the most important thing is to have fun with it and I didn’t find that fun.

This is exactly how I bet hockey games.  It’s worked out alright for me so far, but again who knows how it goes in the future so don’t risk anything you aren’t comfortable losing.  Also please remember just because a team has a 60% chance to win a game doesn’t mean they will win it… it just means they have a 60% chance to win it.

Proving that the ratings work

One of the challenges when you do anything in analytics is not getting drawn down deep into your own rabbit hole.  By that I mean it’s easy to get so lost in your work that you lose sight of what your goal was when you started.  In my case my goal was to learn more about the game while making it a little bit easier for people (who have less of a math background) to understand hockey analytics.  That’s why I created player ratings it is an easy to understand system (higher numbers are better and indicate more skill in a particular area) that allows for easy(ish) testing of hypotheses.

Of course, that assumes that you don’t get lost in your own work.  I got so focused on developing the system that I never bothered to use it in the way that I intended.  So this is the first (of hopefully many) articles explaining both a) that the ratings work and are valuable and b) some practical applications for the ratings.

Since this is the first real article outlining the value of the ratings, I am going to start simple.  One of my earliest hypotheses about hockey (which is probably biased because I am a defenceman in beer league – by virtue of my ability to skate backwards) is that a strong defense trumps a strong offense.  At least in terms of your ability to outscore your opponent.  So I wanted to test this hypothesis while simultaneously illustrating that the ratings do in fact work.

Goals: 

  1. Show that the ratings do correlate with success through various measures of success, such as eGF%, GF%, CF%, RF% (rebound for %).
  2. Show that strong defense is better than strong offense.

Methodology:

  1. I looked at all shot attempts during the 2014/2015 season at 5v5 (goalies in) play for this test.
  2. All players on the ice for these attempts were rated – based on their historical play PRIOR to the shot attempts.  This is key because this model uses only information you have at your disposal at the time.
  3. They were grouped into Forwards, Defence, and Goalies and their average  ratings were used per position.  This means that if a team had a 74OV, 78OV, and 76OV on forward that it would average them out to 76OV.
  4. I (somewhat arbitrarily) determined that a rating of 67 or below was “bad”, 68 – 72 was “average”, and 73+ was “good”.  This isn’t entirely arbitrary.  The ratings are centered at 70 with a standard deviation of 5, however, the averaging effect will reduce the standard deviation of each group.  This means that the 2.5 points (on either side of the mean) I used to delineate bad and good roughly correspond to one adjusted standard deviation.
  5. Each shot attempt was then put into a basket based on the quality of the forwards’ offensive ability (defined as offensive overall in the ratings table) versus the quality of the defence’s defensive ability (defined as defensive overall in the ratings table).  For example, if the forward group was 74OFFOV and the defensive group was 69DEFOV then the attempt went into the GOOD FORWARD/AVERAGE DEFENSE bucket. 

As a result there are 9 possible buckets into which each attempt could fall and what we would be hoping to see, if the ratings are good, is that there is a positive relationship between forward’s offensive abilities and their various metrics (GF%, eGF%, CF%, RF%) and a negative impact on those same forwards based on the quality of their opposition.

forwardsvsdefense

What you can see from the graph above is that relationship holds true.  With information only available to us prior to the events occurred, we can show that forwards with higher offensive ratings perform better than those with lower ones.  We can also show that defencemen with higher defensive ratings outperform their lower rated counterparts.

Having good defensive defencemen on the ice skews each category strongly in that team’s favour.  While they still get outscored and outchanced when facing good offensive forwards, they strongly outperform average and bad offensive forwards to the point where in 2014/2015 defensive defencemen outscored their opposition by almost 10% (990 to 910).  While good offensive forwards also outscored their opposition (and by a wider margin 1400-1150) it should be noted that good offensive forwards are expensive (in terms of cap hits) and good defensive defencemen can have a similar impact on your team.

I should also add that the GF% (and all the other analytical measures) shown by the defensive defenceman group is independent of other factors – like how good their forwards are.  In a later post I will examine how good their forwards are – but that defensive ability alone is enough to generate such a strong positive GF% in 5v5 is worth considering.

forwardsvsdefensetable

Team Ratings

I have taken the next logical step with my player ratings and created team ratings.  These ratings are based entirely on 5v5 play (just like the player ratings) and how the coach uses those players.

The calculation of the ratings is relatively straightforward – I look at each shift a team has and then I create an average (for that shift) of their forward, defense, and goalie ratings.  Over the past 10 games (or fewer if there are fewer than 10 played in the season) that data is then averaged out to come out to a team rating.

I feel it’s important to generate the data this way because it looks at actual usage by the coach.  For example, it wouldn’t do much good to the 2012/2013 Maple Leafs if MacArthur had a high rating but wasn’t used much.  It also automatically gives a higher weighting to a 1st line player (assuming he gets more ice time) than to a 4th line player.

The overall rating for the team is determine by taking a 30% weight of the goaltending rating, a 35% weight of the defence, and a 35% weight of the forwards.  Pretty straightforward.

The player ratings themselves (as previously discussed in another post) are based on the historical performance of the player in 5v5 situations.  The weightings (at this point in the year) are based on 20% 3 seasons ago, 30% 2 seasons ago, and 50% last season.  Once we pass the 20% mark of the season that will change to 30% 2 seasons ago, 50% last season, and 20% this season.  The weighting will then subsequently move towards that original weighting (20, 30, 50) as we get closer to the completion of this season.

The ratings are 70 base (meaning average) with a 5 standard deviation.  However, it is exceedingly unlikely for a team to have an overall rating 5 points higher than 70 because it is averaged out through the whole team.  Because of the averaging the standard deviation (on the team level) shrinks dramatically and having an overall rating just 1 point higher than another team is very significant.  Also, I didn’t redistribute the results – meaning the average for team OV won’t actually be 70.  This is due (primarily) to the fact that better players play more than worse players, and the better goalies tend to play more.  So it is possible for every team to have a starting goalie with a rating over 70.

20152016 Best Teams

As you can see in the chart – it is much easier for teams to have a wider divergence on goalie skill (since only one or two goalies play) compared to forward or defence skill.

Finally, NONE of the data shown for any season uses data that I shouldn’t have.  This means that the ratings are generated each and every game (starting at the 20% mark of the season) and then stored in the database.  This means that if you put in a different end game on the ratings page – the rating will be different for a particular team.  In this way you can see how player performance, usage, and trades have impacted a team to make them stronger (or weaker).

I will be adding charting shortly.

 

The Impact of Cheating

I think one of the biggest issues in any kind of analytics is trust.  There is lots of data presented by various sources and all we get to see is the end results.  That’s one thing when you’re dealing with just raw data, such as what NHL.com provides in their play by play summaries.  It is quite another when you’re dealing with more complex algorithms (such as goals above replacement and expected goal models).

The issue arises when you create these algorithms or models.  As an aside, my experience with cheating in models actually comes from my finance background, I can’t tell you how many times someone would pitch a model that “backtests perfectly”.  The reason it backtests perfectly is because you have access to all the data – everything occurred in the past by definition in a backtest.  What we are interested in is how the algorithm will perform in real market conditions (i.e. when you don’t know what will happen).  In the case of stock trading algorithms it becomes easy to turn your machine off when something bad is going to happen to your model – because you know when something bad is going to happen.  In expected goal models (which I will use as an example for the remainder of this article) it is also much easier to have a “good” model because you know what happened.

To put it another way, if you created an expected goal model for 2013/2014 then the sum of your expected goals will equal the sum of actual goals scored (within some reasonable rounding difference).  Why? Because if your expected goals were higher then you’re just over-estimating (or vice versa for lower) goals – and your model would be inherently flawed.  The problem arises when you use your 2013/2014 model and apply it to 2014/2015.  If you didn’t change any of your variables would the sum of your expected goals still equal the sum of actual goals?  That is one mark of a “good” model: if it still fits without needing any “final” (i.e. which shot attempts actually resulted in goals) information – just the raw inputs (i.e. shot location, rebounds, etc).

But we create all these models and do all this work as a way to predict the future right? In my case, the answer is “sort of”.  I don’t think you can actually predict the future in hockey because there are way too many variables.   Any model that you create to predict future performance for a player or team can easily be impacted by usage, injury, opponents, opponent injury, ice conditions, etc, etc.  All we can do (in my opinion) is create something that highlights which players were previously valuable to their team in the hopes that they will be valuable again.

To bring it all back to cheating, the problem with creating these models is then the inevitable backtest.  We put all kinds of work and effort into the model but we can’t prove it’s good unless we can show that it does something useful or something better than what’s currently out there.  I read an article written by someone else who had the good idea of using his model at various points in the season to predict how a team (or player) will do for the balance of the season.  The problem is he used some end of season data (data that you wouldn’t have in real world conditions) to create and backtest the model.  Unfortunately, that invalidates a lot of the work that is done because it yields results in the backtest that are materially better than what could otherwise have been achieved.

To demonstrate this I ran the same backtest with my expected goals model.  At various intervals through the season (20% of GP, 40%, 60%, 80%) I used the data available to me to predict the GF% for a team for the remaining games of the season (i.e. at 20% a team will have played about 16 games and I try to correlate stats I have available with how they perform for the remaining 66).  Below is a chart.

 

Correlations - No Cheating

Shot attempts (SA%) beats my expected goals for model at each interval of the season in predicting the remaining games GF% for a particular team.  There are a couple of guesses I have as to why – but that is the subject for a different article.  The model I created does not take player skill into consideration – in fact it intentionally doesn’t.  The reason is because I am trying to isolate player skill by saying that a particular player had an expected goals for of X but actually scored Y, and therefore may be a high skill player.  However, if I am using the model to try and predict future GF% then it would be good to incorporate some “skill” into the model.

In the below chart I incorporated the player skill in two ways.  One is with actual data available to me at the time (shown as ‘aEGF%’ in the legend) and the other is with cheating data that wouldn’t be available to me (shown as ‘cEGF%’ in the legend).  The actual data just looks at how much a team beats their EGF by and then transforms the EGF based on a multiplier, which is why it says 0.1 aEGF% in the legend.  That means that I applied 10% (you could say regressed) of the amount they beat their EGF to create a new value 0.1 aEGF% (0.3 aEGF% is just applying 30% instead, etc, etc).  The cheating data looks at how much they beat their EGF for during the entire season and then uses that data to predict the remaining games.

You can immediately see why using the cheating data is a problem.  My variables now include part of the solution.  I know how much they beat their EGF by for the entire season – so applying that transform to their known EGF at each interval still gives me a huge edge as shown in the chart below.

Correlations - Cheating

While the chart is very ugly you can see (sort of) that even using just 10% of the cheating information applies enough of a transform to the data to make my EGF% correlations superior to SA%.  Each subsequent increase in the amount I allow myself to cheat (.3, .5, .7, and 1.0) skyrocket the correlations – and at each interval.  There is no point where cheating more isn’t more beneficial to presenting my data as “good”.  There is, however, no such improvement seen in using actual data.

There may be a case that using some actual data (i.e. the 0.1 aEGF) actually makes the model better at predicting future GF%, but the more I apply the more my EGF model begins to resemble just plain old GF.  Since actual GF% is the worst predictor of the three – the more I apply the worse off it seems to get.  So what this means is: if I created the model using the cheating method and got fantastic results – I would expect that in the future when I couldn’t cheat that I would also get fantastic results.  The reality, though, is that the results with actual information don’t differ that much from my original non-transformed results.  So in effect, I would have over-estimated the importance of beating the model in predicting future results and therefore had a less effective model to predict “real” future results.  

For those of you who are more numbers inclined, the data from the table is presented below.

Correlations - Table

 

2015 First Round Western Conference Playoff Predictions

So what good is an advanced stat if it can’t help you make some playoff predictions?  Here are my picks for the first round of the NHL Playoffs based primarily on their 5v5 eGF stats.  The reason I look at 5v5 and not (usually) special teams is I feel like 5v5 becomes much more crucial in the playoffs and the team that can dominate there tends to win.

Western Conference

1. Anaheim vs 8. (Wildcard) Winnipeg

Possession: Despite the huge difference in standings (well I guess 10 points isn’t that huge), Winnipeg actually had better eGF possession numbers than Anaheim at 5v5, although both were comparable.  The only reason Winnipeg was fighting for that last spot is the penalties they take (oops… already broke my own rule about talking special teams).  Both teams generate nearly identical offensive chances and allow similar defensive chances.  Edge: None

Skater Skill: Anaheim finishes on slightly more of their chances, likely due to the duo of Perry and Getzlaf.  If Winnipeg is able to shut these two down, they will have a good shot at advancing.  Winnipeg, on the other hand, doesn’t really rely on outright skill to score.  They generate chances and score at a more or less average pace.  Team scoring over individual skill.  Edge: Anaheim

Goaltending:  Winnipeg has the better goalies.  Hutchison and Pavelec combined to reduce goals against below expected while Anderesen more or less matched expectations and his backups suffered.  Edge: Winnipeg

Overall: It’s really close, but I’m going with Winnipeg.

2. Vancouver vs 3. Calgary

Possession: Neither team is anything to write home about possession wise, with Vancouver ending the season with just a 47.8% eGF and Calgary with a slightly worse 46.0%.  Vancouver has been playing better since the deadline, at least in terms of possession, and seems to have improved their defensive game especially. Edge: Vancouver.

Skater Skill: Vancouver has the twins, but really outside of that they don’t seem to have anyone capable of scoring more often than they should (given the chances they get).  Calgary, on the other hand, seems to be able to score way above their expectations.  Whether this is luck or skill, one season might be too small a sample size to tell.  Edge: Calgary.

Goaltending: The only reason Calgary is even in the playoffs is Jonas Hiller.  He played outstanding all season and there’s no reason to think that will change.  Ryan Miller and Eddie Lack didn’t really combine to do anything special in net for Vancouver this year and I’m not sure why that will change either. Edge: Calgary.

Overall: This is a literally a crap shoot but I’m picking Calgary.

1. St. Louis vs 7 (Wildcard) Minnesota

Possession: Both teams are powerhouses in possession with Minnesota (53.9% eGF) edging out St. Louis (52.35% eGF).  Minnesota spent the first half of the season dominating play but still not getting the wins as they were in desperate need of a goalie.  Since Dubnyk arrived they have continued to dominate play and can’t seem to stop scoring and winning.  St. Louis, on the other hand started off looking unstoppable but have seemed much more beatable of late as their offence has gone quiet.  Edge: Minnesota

Skater Skill: Both teams beat their eGF, but Minnesota (0.84GF vs 0.76eGF) did it slightly more convincingly than St. Louis (0.81GF vs 0.78eGF).  However, Tarasenko and the Blues blue line forces me to give the edge to St. Louis.  Edge: St. Louis.

Goaltending:  Here is where it gets very interesting.  Backstrom and Kuemper played horribly for Minnesota, but Dubnyk put together a season that could have been Hart worthy (if not for Price and Ovechkin’s also great seasons).  He allowed just 0.53GA per 20 minutes on expectations of just 0.63GA per 20 minutes.  So even if he didn’t play great (and just played average) Minnesota is a tough team to score on.  St. Louis, on the other hand, got much more consistent goaltending all year from Jake Allen and Brian Elliot, both of whom also beat expectations (albeit just slightly compared to Dubnyk).  If Dubnyk stays healthy it’s hard to see how this is even close, but if Backstrom or Kuemper have to go in all bets are off.  Edge: Minnesota.

Overall: Minnesota for sure… unless Dubnyk gets hurt.

2. Nashville vs 3. Chicago

Possession:  Gone are the days when Chicago simply dominated possession as Nashville actually had the better eGF% (54.17% vs 51.52%) by a pretty decent margin.  Even more troubling is since about the trade deadline the Hawks have become a negative possession team (eGA > eGF) that has had to rely on goaltending to win games.  Edge: Nashville.

Skater Skill: Again this one isn’t even close, as Nashville has managed to score more often than expected while Chicago scored substantially less often.  The loss of Kane probably hurt them here, but at least scoring has picked up a bit in the past dozen games or so (5v5).  Edge: Nashville.

Goaltending: Here is the only reason Chicago is even in the playoffs.  All of their goaltenders played well above expectations this season combining to allow just 0.64GA per 20 minutes against expectations of 0.8.  Pekka Rinne had a much easier time in net allowing 0.6GA per 20 minutes against expectations of 0.66.  All awesome numbers, but I have to give the edge to Chicago for reducing goals against by 20% versus expectations.  Edge: Chicago.

Overall: Pains me to say it, but Nashville.  I’m a big fan of Toews, so if anyone can help Chicago beat the odds it’s him but things aren’t looking good for the Hawks.

 

 

eGF Algorithm Update

One of the key goals I had with my eGF model was attempting to explain as much as possible about the “how” of NHL goal scoring.  To that end a perfect model or algorithm would see low quality chances have a very small conversion rate AND a small total number of goals.  The version of eGF I first released on this website saw (in my opinion) too high a concentration of absolute number of goals in the “middle” of the quality curve.  That is, despite low-medium conversion rates, quality 1 and quality 3 chances counted for a large number of absolute goals due to the massive volume of these chances.

By re-writing the algorithm to take account of all the information I have found in my research (shooting areas, rushes, rebound, time effects, and face-off effects) I have made the model closer to “perfect” (again with the limited amount of information we have, we will never have a perfect explanation).

Percentage of Total Goals by Chance Quality and Version
Percentage of Total Goals by Chance Quality and Version

As you can see from the table above, eGF v2 had a lot of goals (as a percentage of total) as a result of the low-medium quality buckets (68.2%), whereas eGF v3 now has just 48% of goals being the result of low-medium quality chances.  While this hasn’t had the impact of dramatically improving the correlations of the model, it meets my goal of trying to understand exactly how goals are scored at the NHL level.  Under eGF v3, your team is not likely to be successful unless they can generate high quality scoring chances – something that also passes the “eye” test.

One of the other goals of my algorithm was to remove as much “noise” as possible from the games, to focus in on just important events.  You could create a model that “explains” goals by simply having all (or most) of the opportunities in one or two buckets.  eGF v3 has effectively filtered down the number of chances in each bucket as well, such that low quality chances make up the majority of chances (despite not accounting for the majority of goals).

Total Chances by Chance Quality per NHL Season
Total Chances by Chance Quality per NHL Season

So as you can see from the table above, the two most frequent goal types (Quality 4 and Quality 5) are also two of the most infrequent types of chances.  Quality 3 actually has a relatively low number of chances as well, which I believe to be the result of it being difficult to generate a Quality 3, but when it is generated it is likely to create further opportunities (rebounds etc) that lead to Quality 4 and Quality 5.

The model also needs to be consistent by season.  That means that more or less each chance quality needs to have the same goal conversion rate and each move up in quality should result in a higher conversion rate.

Goal Conversion Rate by Chance Quality
Goal Conversion Rate by Chance Quality

Finally, the absolute number of goals scored by quality should remain somewhat constant as well.  There really isn’t a strong reason for this, other than I think it helps to convince me of the validity of the model.  The game doesn’t really change that much from year to year and a high quality chance in 2010 should still be a high quality chance in 2015, further teams should still have the goal of trying to get higher quality chances for themselves while limiting them for opponents.  So on the whole you’d expect to see total goals by chance quality remain somewhat static throughout the years.

Total Goals by Chance Quality
Total Goals by Chance Quality

The data appears relatively consistent in this regard as well.  Since the 2014/2015 season isn’t done yet I won’t make any final claims, but it does appear like goal scoring is down a bit this season.  Obviously 2012/2013 is lower due to the lockout.

In conclusion, the new version of the eGF model does a much better job than the previous version in determining what a high quality chance is.

Explaining the Leafs… maybe

The 2014/2015 Leafs are something of an enigma to anyone who believes in advanced stats.  At the time when Randy Carlyle was fired (game 590 if you want to run the numbers yourself) the Leafs were in 3rd last in eGF% (44.38%) but 17th in GF% (49.71%).  Since then the Leafs have improved their possession numbers to 20th (49.06%) but dropped in GF% to 2nd last (37.25%).  Whether you use Corsi, my eGF, or some other version of scoring chances, the drop in both goal production and the increase in goals allowed is baffling.

The problem with all of hockey’s advanced stats is they come from one source (well three, but really mostly the one): the NHL play-by-play summaries.  The issue is that while every event is logged into the PbP summary, there is very little else there.  If, for example, it’s a shot; all we know is the approximate location of the shot and who took it.  We can also look at the events preceding the shot to try and determine some level of quality of the chance, but really a lot of the most important factors for scoring goals are unknown to us.

Because of these limitations we don’t know if a shot from the low slot (which has a very high probability of turning into a goal) was a “clean look” (i.e. no defenders in the way) or if it had to be shot through a sea of legs.  The impeded view can work against the goalie, but it also works against the shooter.  There are only so many places on the net he can put the puck if there are things in his way, which reduces the amount of net the goalie needs to cover in order to save the puck.  Or maybe the shot was partially blocked or the speed reduced because a defender had knocked the shooter off balance.  The point being, we can estimate the quality of any given scoring chance from the PbP data, but until player tracking is available the estimate will always be very rough.

My initial research has shown that there is actually a huge impact on goal scoring (more to come on that later) the more defenders there are in the defending zone.  While this makes logical sense, it is impossible to quantify with our currently available information.  A decent proxy might be face-offs, which I discussed at length in a previous post.

The reason, in my opinion, that defensive zone face-offs rarely result in goals against (at even strength) is because there is a plan in place for losing a face-off in your defensive end.  All five players know where to go and the defense has the advantage of starting with all players in their defensive end.  A rush, or counter-attack, on the other hand is more likely to result in a goal against because you have your defending players moving from offense to defense and not all of them are necessarily in position to defend yet.  Basically, it’s a five second power play.

So with all that said, let’s just admit that with the 2014/2015 Leafs we cannot come up with a definitive statistical measure that would explain their lack of goal production and their suddenly much worse goaltending.  At least not with the information currently available.  What we can do is assume that Randy Carlyle is a wizard, or maybe he inherently knew that the team he had wasn’t that strong.

Here’s where it gets a little bit interesting.

What we know from watching the Carlyle Leafs is that they spent a lot of time in their own zone.  We also know that it’s not too likely that another team scores (or at least less likely) on you when you have all five of your skaters defending in the zone.  So, in theory, the other team will chalk up a lot of shots – which eGF, for example, will still view as good quality chances. But your goaltenders will likely save more than you would expect, because again most measures don’t account for reducing shooting percentage rates when all five defenders are in the zone.  Finally, we know that when the Leafs did counter attack it was generally “rushes” (again, hard to quantify with our existing info) where the other team did not have all five defenders in the zone.

We can then further assume that, under Randy Carlyle, the Leafs would enjoy a GF > eGF and a GA < eGA.  This for the most part is true, as the Leafs over the entire 2012/2013 and 2013/2014 campaigns had an eGF/20 (expected goals for per 20 minutes of 5v5 ice time) of 0.69 but actually scored a GF/20 of 0.81.  Their eGA/20 was 0.86 versus GA/20 of 0.84.

The problem, as we all know, is this strategy didn’t seem to work in the second half of the season.  My theory, again unprovable, is that this isn’t a conditioning issue or a “worn out” issue, but rather that other teams take a certain number of games each year to hone their defensive game.  The mistakes the Leafs capitalize on in the first half of the year might not be available in the second half.  It doesn’t necessarily show up in the stats as we really only have one full Carlyle season to look at.

Also, it is important to note that I don’t think the Leafs at any point in the past decade or so have been a “good” or even “average” team (if you include goaltending).  This 2014/2015 team, in particular, was probably going to get outscored by a decent rate at even strength over the course of the year under any system.

What we do know

Now, if you can take the leap of faith that all of the above assumptions are true, we do have some concrete numbers to look at in terms of winning games when you have a “bad” team.  Let’s further assume that the Leafs had a negative even strength goal expectation of -0.2 per game.  If your goal was to actually win, then you might actually follow Carlyle’s strategy.  Why?  Variance.

The 2013/2014 Leafs scored around 1.9 goals per game and allowed 2.1.  The thing about hockey though, is you can’t score partial goals.  So all that matters in any given game is whether or not you score more goals than your opponent.  If you employ a system which limits the variance on your goals allowed, bearing in mind that you’re going to allow about 2 a game (at even strength) then you know what it will take to win.

Goals Scored Expected Points
0 0
1 0.5
2 1.25
3 1.5
4 2
5 2

As you can see from the table above, if you score 4 or more goals at even strength you’re expected to earn 2 points.  This isn’t based on anything other than intuition, I could do the research, but this is just to showcase how a play style might be more effective at earning points when you have a bad team.  I don’t think “researched” values would be materially different.

Now we can break up our “for” events into quality of chance.  We know we are going to score 1.9 goals on average, but how we score those goals matters.  If, for example, we throw lots of pucks on the net and convert at 3.5% then we would have 54 events for in a typical game.  If we just wait and try to pounce on opportunities that convert at 15% (and only had those opportunities) we would have just 13 events in a game.

Goals vs Quality 0 1 2 3 4 5
0.85% 12.12 23.27 22.24 14.11 6.68 2.52
3.50% 11.98 23.45 22.54 14.17 6.55 2.38
7% 11.56 23.49 22.98 14.42 6.51 2.25
15% 9.91 22.74 24.08 15.58 6.87 2.18
20% 8.80 22.01 24.76 16.51 7.22 2.17

The above chart represents the binomial distribution of games (in an 82 game schedule) where you would score 0,1,2,3,4 or 5 goals based on the quality of your chances.  What you can see is that there are fewer games where you score zero goals (even strength) the higher the quality of your chances are.  Remember the expected goals for the season are the same under each quality.

If we convert that into points:

Quality Points on the season
0.85% 81.12
3.50% 80.87
7% 81.21
15% 84.2
20% 86.55

What we can see is that using a “Homer Simpson Boxing” (for lack of a better term) strategy might actually earn your team more points on the season, assuming you have an inferior team to start with.  Basically, variance is your friend on offence and your enemy on defence.  So any system that increases offensive variance while decreasing defensive variance will earn your team more points.

This is probably why Horachek is having so many problems.  The Leafs are still a bad team (worse since he took over since they traded away a lot of their talent), they just no longer run a system designed to increase offensive variance.