steph

Stats, Eyes and Lies – revisited

The granddaddy of True Hoop, Henry Abbott, recently wrote

Since I first interviewed the brilliant Dan Rosenbaum… — the first time I encountered advanced stats — I always felt that the basketball world and all of its followers were on an inevitable path to embracing all of this and more. The information is simply so valuable; it’s the natural evolution of things.

And later in the article (emphasis added):

I know a number of stat experts who work for NBA teams. Many, but not all, live with the walking disappointment at how little their employers embrace, or even understand, what they’re saying.

My recommendation would be: Stop showing them spreadsheets. They’re never going to make big decisions that way. It’s not how most people think.

Use the stats and the spreadsheets more than ever, but behind-the-scenes, like plumbing — not as the final element in the presentation.

Bingo. A vocal few didn’t like my criticisms (see “Stats, Lies and Eyes“) of metrics like PER, Wins Produced and adjusted +/-. While I use them often, I also understand the limitations.

It’s this simple:

Advanced statistics have the ability to “see” every single possession – a GM/coach/scout/fan does not. Advanced statistics are emotionally unbiased – a GM/coach/scout/fan is not. Yet, a GM/coach/scout/fan can see that player X alters shots more often than others, forcing more misses – while most advanced metrics will only give credit to the rebounder of that shot and none to the defender. A GM/coach/scout/fan can see player Y setting hard picks which leads to clearer lanes to the basket – while most advanced metrics only give credit to player than made the basket.

The punchline: marry the two. You cannot make the best decisions without consulting both.

“I’m not smart enough to do the math.” – Jeff Van Gundy at MIT Sloan Sports Analytics Conference

If you have good tools, use them. You may have “good eyes”, but you will not see every play by every team for every game. And you may endorse a certain metric, but we also need to understand its limitations.

I love chatting with the Raptors journalists (some really great ones in this city), announcers and even occasionally team staff on this topic. As you can imagine, their views on advanced statistics vary from “it’s useless, I trust my eyes” to “I love the stuff”. When asked my view (they usually believe I’m very biased for obvious reasons), I sometimes surprise them when I say “I would never make any conclusions from just the numbers, but it many cases it helps you ask the right questions. What do I mean? By way of example: “Why does player X have a higher Wins Produced and Advanced Statistical +/- then player Y” or “Why is this lineup better than that lineup where my intuition would conclude the opposite?”

The New New Things

The annual MIT Sloan Sports Analytics Conference was recently held in Boston. It is a great source of new ideas. One of my issues with the state of some commonly used metrics like PER and Wins Produced, is the authors spend more time criticizing each other rather than accept the challenges with their approaches and continuously improve upon them. Surely, in a complex game like basketball, a “single version of the truth” will elude us for quite some time. Thankfully, many continue to bring new ideas to fold.

RR Forum Super Moderator Matt52 recently noted the Toronto Raptors were one of the early teams who adopted STATS LLC’s optimal recognition cameras in their arenas. I had caught wind of it before the season started, but didn’t mention anything as it was still “early days” for the technology.

The optical camera recognition data has several potentially powerful applications. Last year when I attended the Sloan Sports Conference, I attended the “What Optical Tracking Data Says about NBA Field Goal Shooting” and came away impressed. These cameras (well, the data from them) can answer questions like “how wide open was player x?”. While video analysis can give coaches much of what they need via their “eyes”, its impossible to make the time available to answer this question for a large number of players in a season. You can see a defender isn’t closing out to the degree that he should, but to have the precise measurement of all close outs for a season is more powerful. Even more compelling is to analyze in what situations is he closing out better or worse. And then you head back to the video room.

The data can quickly help scout opponents and have the coaches focus their video analysis on key themes. [It also puts facts into the argument – such as Matt52 point out: Jose Calderon is “quicker-than-you-think“] This saves time and ultimately gives coaches more tools in order to make adjustments.

A worthy read which is on topic:Can Basketball Reboot Its Box Score?” by the great Joe Treutlein from HoopData.com

“… basketball is a dynamic and fluid game with 11 constantly moving parts – 10 players and the ball. This often makes it difficult to quantify certain basic events on the court, and even harder to properly assign responsibility among the players.

How does one distinguish the weighting of value between the passer and scorer on assisted field goals? When Jason Kidd makes an unchallenged pass to Dirk Nowitzki shooting a contested, fade-away, mid-range jumper, but Steve Nash gets in the lane, draws a double team, and kicks it out to a wide open Jared Dudley in the corner, are both of those passes providing equal value? How about if only Dirk’s goes in – was Nash’s worth nothing?

Indeed, Oliver pointed to proper division of credit and blame being among the largest challenges facing the advancement of statistics in basketball, and said he’s actually spent some free time reading legal theory books trying to get a better handle on the issue.”

I have added the emphasis above. This was exactly the point I was trying to make in my initial “Stats, Eyes and Lies” post.

Another interesting paper that came out of Sloan this year was Kirk Goldsberry’s “CourtVision: New Visual and Spatial Analytics for the NBA” Why? Because it helps answer questions. For example, let’s look simply at Tyson Chandler’s league leading FG% and perhaps take my Wins Produced derived conclusion that this high percentage shooter should replace many of Carmelo Anthony’s shots. The data from Goldsberry’s paper notes that every player in the top 10 last year in FG% was either a forward or center. However, other factors are important (including the eFG% concept vs basic FG%) like Spread (total spatial spread of a player across all scoring cells) and Range (effective shooting range of player across all scoring cells). Why? Because if Tyson Chandler’s spread percentage is much lower than someone like Kobe Bryant or Ray Allen, a problem emerges. Its hopefully intuitive, but a coach cannot simply “take” shots from a low efficiency shooter like Carmelo Anthony and “give” them to Chandler if the vast majority of his shots are only a few feet from the baskets – several of which are put backs. The opposing coaches are all dim – they will adjust. Thus, spread and range are important concepts in efficient offenses – keeping defenses “honest” is key.

Local favourite Eric Koreen sums it all up well:

A New New Thing: Advanced Statistical Plus/Minus (ASPM)


With Bargnani out of the lineup so long and the overall lack of talent on the roster this year, it makes it more challenging for us to debate optimal lineups, minutes played etc. The Raptors are going to lose more often than not and no amount of tweaking (except for the obvious switch from Rasual Butler to James Johnson) will make a dramatic difference.

However, with the theme of “new new things”, I thought I’d highlight Advanced Statistical Plus/Minus. As I mentioned a few posts ago, the “Sport Skeptic” examined the predictive power of various “holy grail” metrics and ASPM came out on top (it also came out near the top for explanatory power).

The following shows the Raptors ASPM data and Value over Replacement Player:

Daniel, the author, explains the concept here.
The metric also seems to meet the “eye test”. Offensive star Jose Calderon shows as such, but also has a defensive liability. While James Johnson, who may not rate well on a PER or WP scale, rates highly here for his defensive ability.

Questions? There is a dedicated to “Statophile Q&A” forum thread here . If you prefer to send questions privately, you’re welcome to email me at tomliston [at] gmail [dot] com or find me on Twitter (@Liston).

21 Responses to “Statophile 28 | The New New Things”

  1. Andres Alvarez

    Tom,
    You’re such a fan of stats. You linked a link post (we were repeating Wayne Winston) as an example that we’re more interested in criticizing other stats. Is that really indicative of what we spend most of our time at the Wages of Wins Journal?
    In terms of accepting challenges. We have a pretty comprehensive FAQ. We have our steps for calculating the metric fully flushed out. We’ve answered critics pretty well. The fact that they don’t like our answers does not mean we’ve done nothing.

    Reply
    • Tom Liston

      From what I’ve read, you do *not* spend most of your time criticizing other stats.  The WoW blog team does spend quite a bit of time doing so, however.  And the link includes commentary from Dr. Berri. BTW, I agree that PER “rewards” volume shooters.

      The Wages of Wins FAQ is good and answers most of the questions (see link: http://wagesofwins.com/faq/).  As I said in my previous post it does not address the allocation problem (as well as missing important elements).  I’m glad Dean Oliver thinks the same way.  Several metrics share this problem.

      Dr. Berri has made significant contributions to the field.  I’ve enjoyed both of his books.  I do have an issue when the Wow team makes bold statements based on an imperfect metric.  

      For example, when the WP model was changed, some scores changed dramatically (Blake Griffin’s WP48 was 43% higher under “old” WP48 vs “new” WP48), yet for years it was argued WP did not have a rebounding bias. And conclusions were drawn previously based on data that was inflated by 43%.

      I would echo Evan’s (The City) post: http://thecity2.com/2010/12/12/ezpm-yet-another-model-for-player-evaluation/
      “First, let me emphasize that my understanding and appreciation of WP is what led me to start considering alternative models.”  as well as: “If my post inspires Berri and others to re-consider their models, great. If not, that’s fine, too. “

      Reply
    • mathletics

      Hey Andres – the wages of wins/win shares stuff is fascinating and useful, however it obviously has its flaws, and shouldn’t be used as “proof” or the soul criteria to argue a larger subject.  The single biggest issue is of course the reliance on box-score data that’s inherently flawed (garbage in/garbage out) – and the complete ignorance of defence (treating it as a “team activity” allows a regression-type analysis to spit out correct values based on other factors, but says nothing about the individual’s performance aside from rebounds/blocked shots – shot altering, defensive closeouts, preventing entry passes to stop high-percentage field goal attemps, etc. are all data points that would need to be properly tracked in order to arrive at an individual’s proper defensive value).  The Miami Heat, for instance, have a wide variety of in-house measurements they apply to every possession, that illustrates the value of a player like Joel Anthony in ways that a box score never could (http://offthedribble.blogs.nytimes.com/2011/04/19/look-beyond-the-box-score-for-joel-anthonys-value/).  By only factoring in points/rebounds/etc., he’s a net loss, but by properly valuing the other activities he performs on the court, he’s a monster (which is backed up by large sample plus-minus data, if you buy into that stuff).
      I’m not hating on the metric, I just wish the folks at WoW would acknowledge that it can be improved, and that it’s not a be-all end-all at this point.  I’d love to see, for instance, what value can be attached to supposedly low-efficiency perimeter scorers that can create their own shot in the last five seconds of a shot clock.  The goal in a possession is obviously to create the shot with the highest level of efficiency – however quite often a situation might dictate that a  shot needs to be taken late in the shot clock (and someone has to take that shot).  A perimiter guy will take a hit in his overall efficiency if he makes 4 of 11 of these in a game, for instance, but the team is far better off with those 4 makes than 11 shot clock expirations or shots from a big man out of his comfort zone.

      Reply
  2. Statement

    Tom,
     
    I asked this question before but you probably missed it.
     
     “As I mentioned a few posts ago, the “Sport Skeptic” examined the predictive power of various “holy grail” metrics and ASPM came out on top”

    I also believe that Wins Produced did pretty well for explaining in-year productivity.

    However, what exactly is ASPM predicting?

    To put it another way, when forecasting a variable, you have historical data to which you can compare the results of your forecast model to see how well it sizes up.

    In the case of ASPM and Wins Produced (the two winnners, based on the sport skeptics stuff), what is the “historical data” that you are comparing Wins Produced and ASPM too? 

    I don’t get it.

    Thanks,

     

    Reply
      • Theswirsky

        Question

        (and I’m not completely up on all the changes that were made so I may be a bit confused as to what has happened but,)

        the ‘old’ wins produced model was, apparently, more accurate (?) than the ‘new’ wins produced?  And one of the biggest factors that challenged the ‘old’ wins produced was the theory that it was too heavily weighted towards rebounds?

        Makes me wonder, was WP changed to ‘appease’ the majority at the cost of accuracy (and therefore its very purpose, and the purpose of the science in general)

        Perhaps people need to change their perception…. specifically what constitutes or defines a ‘possession’. 

        Reply
          • Theswirsky

            thanks for the link.

            Brings up another question though. 

            If the change was to take into account diminishing returns since a player takes rebounds away from others.  Should that not apply to every stat? (and further within every advanced stat?)

            A steal made by Jose is one James Johnson COULD have had.  A shot taken (made or missed) was a shot Bargnani COULD have taken.  A turnover Amir made is one Ed Davis COULD have made.

            I’m not opposed to the concept behind what their doing… but it seems if you do it for one stat it needs to be taken into account for all of them… as anyone else could have (conceivably anyways) done the same thing had their teammate not taken away that opportunity

             

            Reply
            • mountio

              I think your argument is theorhetically true .. but the impact on defensive rebounds is MUCH more acute than any other stat. That seems to be their explanation (although for stats guys trying to explain things, they leave their explanations pretty opague in my opinion) ..
              As for diminishing returns in other stats .. Im not so sure. Yes, every once in a blue moon, two guys would be in position to make a steal, but only one gets it. With shots, turnovers etc .. I would think the argument would go that, while every shot takes away a shot that COULD have been taken by someone else, it doesnt take away the same shot with same change of possesion % and so on (whereas many defensive rebounds do .. the exact same change of possesion would happen whether player A or B grabs the ball).
              Maybe Im reaching or getting this wrong .. but this would seem to explain it (at least in part) to me.

              Reply
              • Theswirsky

                I understand what you are saying. But its more about knowing what did happen vs what could have happened and most importantly applying it equally.

                (i’d also mention – I belive – FG% has the biggest impact on WP, rebounds is a close second though.  Not necessarily important though)

                If Kevin Love grabs 15 rebounds… that means there are 15 less for his teammates. Ok makes sense. But if Chris Paul makes 4 steals, that also means there are 4 less available for his teammates.

                So while its likely the steals may not have otherwise happened, we don’t know that for about Kevin Love’s rebounds either. We can assume its more likely…. but assuming what happened or whats true was exactly what I thought WP was trying to avoid. What we do know though is what DID happened and that was true.

                That said if we are going to try and predict what will/could happen, with as high a degree of accuracy as possible, by assuming X rebounds could be had by teammates… its only fair thats applied across the board (proportionally ofcourse)

                It may not make any but a marginal difference in, say, just steals… but steals + assists + blocks + TO, that would add up in aggregate. Diminishing returns for shots (makes and misses) or even FTs would perhaps (just a guess) be comparable as they are common possessions as well.

                (if you assume there is a point where any one rebound adds less value because teammates can do it, is there not a point where made shots are less valuable because teammates can do it, or conversely missed shots are less damaging because teammates could miss them instead?)
                 

                (the above ofcourse has no true value or accuracy… just an example) 

                Reply
                • 2damkule

                  ‘If Kevin Love grabs 15 rebounds… that means there are 15 less for his teammates. Ok makes sense. But if Chris Paul makes 4 steals, that also means there are 4 less available for his teammates.’

                  have to disagree with the premise presented here – a rebound is a quantifiable event that DEFINITELY exists following every missed shot.  a steal is created, and exists only as a direct result of the play by the defender.  essentially, the steal exists in & of itself, while a rebound only ever follows a specific event….you can’t argue, IMO, that player X making a steal means that that steal EXISTED & would have/could have been made by player Y.

                • Theswirsky

                  the opportunity to rebound exists after every missed shot

                  but

                  and opportunity to steal exists on every possession

                  there are 2 types of rebounds  offensive and defensive. If one says a steal doesn’t exist on any specific possession then I don’t think one can claim a defensive rebound or offensive rebound existed on any specific possession either

                • Nilanka15

                  Not sure if I agree with this.  After every missed shot, a rebound exists.  It’s there for everyone to see.  A loose ball in the air, where the first pplayer to secure it gains possession for his team.

                  I don’t think we can say the same about steals.  As yertu mentioned, they have to be ‘created’ by the defensive player.  An entire 48 minute game can be played without the opportunity for a steal.  That’s definitely not true for rebounds (unless of course, both teams shoot 100% from the field).

                • Theswirsky

                  again though. We need to remember there are 2 types.

                  So if I shoot and the defense runs the other way and I grab the offensive board… did a defensive board actually exist?

                  Is that any different than I’m dribbling the ball and the defender doesn’t try and take it from me?

                  Yet if no one tries to steal the ball they still had the opportunity.  Same as if no one tries to defensively rebound they still had the opportunity to.

                  Don’t get me wrong thats extreme and just theory…. but as I mentioned above we don’t know what would or should have happened… just what did happen.  If we try to calculate what would/should happen with a stat, a possession or a game (or advanced metric in general) we are getting into a whole new realm of insanity (#s wise) and likely loose all meaning or reasonable accuracy.

                  Thats why I question diminishing returns existing with just one statistic – if you do it for one, it should exist for all (or atleast all those that merit them.  ie. steals was just an example but I can see steal #s being so low that one almost never hits a diminishing returns wall .  But assists, FG made, FG missed… those could all easily have diminishing returns aswell)

                  opportunity

            • Tom Liston

              That was my original point.  Many metrics have an “allocation problem.”  I was happy to hear (wasn’t there first hand) Dean Oliver agree 
              As told by HoopData.com writer Joe Treutlein
              “Oliver pointed to proper division of credit and blame being among the largest challenges facing the advancement of statistics in basketball…”
              http://espn.go.com/blog/truehoop/post/_/id/38559/can-basketball-reboot-its-box-score
              Oliver brings a ton of credibility to the argument. 

              Reply
              • Theswirsky

                thanks for the links again. 

                Just wanted to add, we need to remember that alot of these metrics were created to improve accuracy and ease of reading the box score and with no intentions of perfection (rather as perfect as reasonably possible BASED on the box score). 

                As few can watch every minute of every game every season, and even within that everyone has an individual bias to what constitutes good basketball, the box score gives us an objective idea of what ‘really’ happened within the confines of the box score itself. That idea is then quantified in a singular numerical value for a fair and reasonable comparison.

                If we want more than what the box score can offer then we either need to:

                1) expand the box score – but that costs time/money/resources

                2) watch every game and be completely objective – and thats probably impossible

                Reply
            • Tom Liston

              As per “diminishing returns” applying to all stats – well, to some degree sure.  Rebounds has the highest.  But steals would be more based on your opponents usage – hence why the league leaders are usually point guards.  (As I noted before 
              steals are based on a successful outcome. Thus, a defender could gamble and attempt several steals without being successful. The “missed steal” gamble is can be expensive as the defender is often well out of position – see
              http://raptorsrepublic.com/2012/02/02/statophile-24-stats-lies-and-eyes/ )

              Reply
  3. Brian B

    This illustrates something under-appreciated about DD’s defensive limitations – the question then becomes does he lack the lateral quicks; the mental framework, or the right system?

    Similar but less dramatic point about Bayless.

    Also nicely captures AJ’s value. And points out that Davis may have more potential than otherwise expected, primarily defensively. But is there room for both?

    Reply
  4. Kevin

    So, is there a formula that attempts to calculate the amount that a player impacts positively or negatively on the performance of his teammates on the floor? A multiplier effect? Would it be possible to determine a coefficient that shows, through the data, the effect of the double-team drawn, extra pass made, hard pick set? In other words, can you identify Shane Battier or Jorge Garbajosa from stats alone? How might it be done?

    Reply
  5. FAQ

    Statistics in b’ball is like assessing history — thinking backwards.  It’s like trying to reduce a great symphony to numbers.

    Statistics – a fetish for numbers for the simple mind.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>