Announcement

Collapse
No announcement yet.

Stats: Why does no one look at variance?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stats: Why does no one look at variance?

    I always see people looking at stats as averages, or extrapolated averages (e.g. Per 36 minutes). How come no one ever looks at the variance of those stats?

    For example, the standard deviation for FG% or rebounds would tell us a lot about consistency and, therefore, reliability. If one player shoots 40% one night and 50% another, whereas another player shoots 45% both nights, who would you rather have?

    I'm not sure if the advanced metrics take this into account, but I haven't read that they do. I guess this is more a question for Blake, but just wondering, is all...

  • #2
    Couple of things:

    - I think your underlying premise that consistency isn't evaluated or considered is wrong.
    - Single games are too small sample sizes to draw conclusions from (just look at the Bulls/Nets and Thunder/Rockets series). There are dozens of variables that could explain a poor performance.
    - It would only mean much if the variance was big and consistent. My guess is that few players fit this bill. And the ones that do, well, we already evaluate them correctly.
    - It all gets factored in to the aggregate number anyway, so, poor performances are reflected in aggregate numbers.

    The best example of this on the Raps is Derozan. People talk about how he has improved various parts of his game. This is true. But if you look at him as a whole player, and his production levels, the various tweaks and improvements have made little to no difference in overall productivity or efficiency. So, despite the fact he had some big games and seemd to be a better player, he has the same impact on games he had last year and the year before.

    Comment


    • #3
      slaw wrote: View Post
      The best example of this on the Raps is Derozan. People talk about how he has improved various parts of his game. This is true. But if you look at him as a whole player, and his production levels, the various tweaks and improvements have made little to no difference in overall productivity or efficiency. So, despite the fact he had some big games and seemd to be a better player, he has the same impact on games he had last year and the year before.
      The new 100 words on Demar post is actually what sparked this thought, "DeRozan’s shooting percentage were nothing short of a rollercoaster" (Ryan McNeil). You don't necessarily need to look at variance on a game-to-game basis. You can look at DeRozan's shooting percentages each month of the season and compare them to someone similar.

      I'm just very critical of aggregate numbers. If one player gave you exactly 18 points per game, every single game, then everything is easy to model. But even though aggregate numbers incorporate poor performances, they don't tell us how often those poor performances are, or how often those amazing performances are. Again:

      - Player A averages 18 ppg, with a range of 12-20
      - Player B averages 18 ppg, with a range of 4-36

      Despite their aggregate averages being similar, they are very different players!

      Comment


      • #4
        Marz wrote: View Post
        - Player A averages 18 ppg, with a range of 12-20
        - Player B averages 18 ppg, with a range of 4-36

        Despite their aggregate averages being similar, they are very different players!
        Even just doing that would have flaws, because no player scores points in a vacuum. There are many factors to consider; some directly related to the player, some indirectly related to a player and some unrelated to the player.

        For instance, if there's a game where a player is being guarded by an elite lockdown defender and some of his teammates have very favourable matchups, should the player be 'penalized' if the coach draws up plays to exploit those advantages and the player follows the plan by deferring to his teammates? Like most stats, just looking at them on the surface can be very misleading and doesn't explain the "why" behind them.

        For a starter who plays significant minutes for most/all games throughout a season, I think the law of averages will account for the 'positive' and 'negative' games related to a specific stat. Although DeRozan had some games that made him look like a budding star, his averages really show that he actually made very little improvement over last season. Also, for any player, you can easily get one impression from 'results' type stats, but an entirely different impression when efficiency is factored into the analysis.


        DeRozan: 2012-2013 (2011-2012) Stat Comparisons

        PPG: 18.1 (up from 16.7), but needed 15.0 shots per game (up from 14.3)

        3PT: 28.3% (up from 26.1%), but identical 0.4 makes / 1.5 attempts per game (both this season and last season)

        FT: 83.1% (up from 81.0%), but nearly identical 4.3 makes / 5.2 attempts per game (had 5.3 attempts per game last season, so he actually got to the line less this season)

        His peripheral stats this season were all similar to last season, but definitely improving:
        RPG: 3.9 (3.3 last season)
        APG: 2.5 (2.0 last season)
        STL: 0.9 (0.8 last season)
        BLK: 0.3 (0.3 last season)
        TO: 1.8 (2.0 last season)

        All of that was a result of playing more MPG this season (36.7, up from 35.0 last season) and being a bigger focal point in the team's offensive gameplan.


        These are just the raw stats. It's up to each individual fan to interpret/analyse them however they see fit, compare them to their own subjective 'eye test' and come to their own conclusion about DeRozan (or any other player).
        Last edited by CalgaryRapsFan; Thu Apr 25, 2013, 12:28 PM.

        Comment


        • #5
          Why are you guys so quick to shut this idea down? I think it's a fantastic idea. How 'bout it, Blake? Would be grateful for an analysis!

          The counter-arguments didn't convince me. If a player defers then it will affect their FGA, but you could look at the variance of TS%, and I think it would say a lot when comparing scorers. Statisticians already find big discrepancies with home and away performance, so why not look at variance in general.

          Comment


          • #6
            red monkey wrote: View Post
            Why are you guys so quick to shut this idea down? I think it's a fantastic idea. How 'bout it, Blake? Would be grateful for an analysis!

            The counter-arguments didn't convince me. If a player defers then it will affect their FGA, but you could look at the variance of TS%, and I think it would say a lot when comparing scorers. Statisticians already find big discrepancies with home and away performance, so why not look at variance in general.
            I certainly think it is worth investigating, as a type of advanced stat.

            Comment


            • #7
              red monkey wrote: View Post
              Why are you guys so quick to shut this idea down? I think it's a fantastic idea. How 'bout it, Blake? Would be grateful for an analysis!

              The counter-arguments didn't convince me. If a player defers then it will affect their FGA, but you could look at the variance of TS%, and I think it would say a lot when comparing scorers. Statisticians already find big discrepancies with home and away performance, so why not look at variance in general.
              I agree. As someone who doesn statistical modeling for a living, this analysis would be very meaningful and useful. For example, I model shipments of goods to home depot stores nationwide. The amount of goods demanded varies greatly throughout the year. If I were to just look at the aggregate as say, we sell 50 bags per store per week on average, so lets send every store 50 bags a week. This would be horrible, as there are parts of the years where stores need 200 bags, and parts where they need none. It's much better to drill down and look at everything on a weekly basis. This doesn't directly correlate to basketball, but let me try to explain.

              For easiness, I'll use Pts/game, but any stat could work here, and best would probably be a graph of Pts/Game and TS% so you could see that relationship as well. If you were to graph a player by Pts/game, with each game being a data point, then you could set a threshold of acceptable number of points, say 15, and then count the number of games above and below the threshold. This would be important, because if DD is alternating 25pt games with 5pt games, then we clearly have a greater probability of winning the 25pt games, and a greater probability of losing the 5pt games. Whereas if he were averaging 15pts a game every game, he'd be giving us the same probability of winning each night. To avoid the pitfalls of small sample size, you look at an entire season and calculate winning percentages for above and below the threshold. This could really magnify the value of having a consistent 15pt scorer, versus one who alternates 25pt and 5pt games. You could use TS% to do the same analysis, and look for links between the two. It may turn out a player who averages 15pts every single night is far more valuable than a player who averages 18pts over the season, but has a large amount of games scoring below 10pts.

              I'd love to see this analysis done. It wouldn't be perfect at first, but a little time in excel expermienting with different graphs and thresholds could provide some interesting results on how valuable consistency is versus random huge games. A player who goes off for 30 against the Bobcats but then goes for 5 against the Celtics is probably less valuable than a player who scores 15 against both of them.

              Comment


              • #8
                Marz wrote: View Post
                The new 100 words on Demar post is actually what sparked this thought, "DeRozan’s shooting percentage were nothing short of a rollercoaster" (Ryan McNeil). You don't necessarily need to look at variance on a game-to-game basis. You can look at DeRozan's shooting percentages each month of the season and compare them to someone similar.

                I'm just very critical of aggregate numbers. If one player gave you exactly 18 points per game, every single game, then everything is easy to model. But even though aggregate numbers incorporate poor performances, they don't tell us how often those poor performances are, or how often those amazing performances are. Again:

                - Player A averages 18 ppg, with a range of 12-20
                - Player B averages 18 ppg, with a range of 4-36

                Despite their aggregate averages being similar, they are very different players!
                Yes, but even that doesn't really tell much of the real story. Let's say I told you that there was player who averaged a little over 10 points, with a range of 0 to 35 as a range. Let's call this guy player A. Now, let's take another guy (player B), who averages 10 points, with a range of 0 to 24 as a range. Sounds like sort of similar players, right? But just based on those numbers, do you have a picture of who this guy is? Is he a guy who scores about nine points most games, and then had just one crazy night that totally throws his range out of whack? Or is he a guy who fluctuates wildly from night to night?

                So let's define average games as average +/- 20% (I'm just throwing a number out here). Now, I'll break down our mystery player A's game: 38% of his games were above average, 46% were below average, and 15% were average. So now we're starting to see that he's the sort of player who fluctuates wildly.
                Mystery player B's game breaks down this way: 32% of his games are above average, 30% are average, and 38% of his games are below average. His performance is much more consistent, and he actually produces a greater percentage of average or better games, despite having a lower upper-end to his range.
                By breaking down games into these levels (and ideally you could use more than just 3 levels... 5 might be ideal), and perhaps using a more comprehensive stat, like the gamescores available on basketball-reference.com (I know, I know, it's a Hollinger stat), you'd get a more complete picture of how often these players win you games, how often they lose you games, and how often they turn in a consistent performance.
                If you're curious, player A is Alan Anderson, player B is Amir Johnson.

                edit: yeah, I like the idea Primer's suggestions here too, particularly trying to incorporate the impact of highs and lows and consistent scores in terms of actual wins.
                Last edited by octothorp; Thu Apr 25, 2013, 03:18 PM.

                Comment


                • #9
                  octothorp wrote: View Post
                  Yes, but even that doesn't really tell much of the real story. Let's say I told you that there was player who averaged a little over 10 points, with a range of 0 to 35 as a range. Let's call this guy player A. Now, let's take another guy (player B), who averages 10 points, with a range of 0 to 24 as a range. Sounds like sort of similar players, right? But just based on those numbers, do you have a picture of who this guy is? Is he a guy who scores about nine points most games, and then had just one crazy night that totally throws his range out of whack? Or is he a guy who fluctuates wildly from night to night?

                  So let's define average games as average +/- 20% (I'm just throwing a number out here). Now, I'll break down our mystery player A's game: 38% of his games were above average, 46% were below average, and 15% were average. So now we're starting to see that he's the sort of player who fluctuates wildly.
                  Mystery player B's game breaks down this way: 32% of his games are above average, 30% are average, and 38% of his games are below average. His performance is much more consistent, and he actually produces a greater percentage of average or better games, despite having a lower upper-end to his range.
                  By breaking down games into these levels (and ideally you could use more than just 3 levels... 5 might be ideal), and perhaps using a more comprehensive stat, like the gamescores available on basketball-reference.com (I know, I know, it's a Hollinger stat), you'd get a more complete picture of how often these players win you games, how often they lose you games, and how often they turn in a consistent performance.
                  If you're curious, player A is Alan Anderson, player B is Amir Johnson.

                  edit: yeah, I like the idea Primer's suggestions here too, particularly trying to incorporate the impact of highs and lows and consistent scores in terms of actual wins.
                  Your levels are essentially looking at the "variation" from the average. Not in the strict mathematical sense, but you've basically helped describe my point: looking at how a stat fluctuates is so much more meaningful than a single number.

                  I, too, work in modelling, but also in generating numbers synthetically that correspond to those models. And whenever I use an aggregate number, without taking into account variation, my generated numbers always have lower fidelity. Hence the idea.

                  @Primer I like those ideas. How do we get Blake to do the leg work again? :P

                  Comment


                  • #10
                    A problem is that there are so many reasons for variance undependent of a players consistency.

                    For example, if it's a big who's getting a lot of his points from PnR situations, his points will be influence by the way the opposing team defends the PnR. The same holds true, of course, for the ball handler.

                    It's also highly depended on the role a player has on the team, just as it's depended on the game plan. If there are favorable matchups on the floor, a coach will try to exploit those. That doesn't mean anything for the consisticy of that player and some coaches try to make more use of this than others.

                    If we take DeRozan as an example, his variance is also related to the way he is being defended (and the quality of that defender). In fact, I think that's one of the most important factors in his case. Does that mean that this makes him inconsistent? No, the explanation for that part of his variance stems from the variance in defenders.

                    And I think there are countless other examples to show other influences besides consistency for a stat like this.

                    All in all, I think there are too many other factors besides a player's consistency causing noise in a stat like this.

                    Comment


                    • #11
                      Soft Euro wrote: View Post
                      For example, if it's a big who's getting a lot of his points from PnR situations, his points will be influence by the way the opposing team defends the PnR. The same holds true, of course, for the ball handler.

                      It's also highly depended on the role a player has on the team, just as it's depended on the game plan. If there are favorable matchups on the floor, a coach will try to exploit those. That doesn't mean anything for the consisticy of that player and some coaches try to make more use of this than others.
                      The simple solution to this is compare 'variance' as per positions. For example, compare a PF to a PF, a SF to a SF and so on. Even though 'variance' provides a more accurate indication of a players consistency, it's OBVIOUSLY not been considered important enough to be used. Nothing to prove this but in this day and age, it's highly unlikely that the idea has not been explored.
                      Attitude Is A Choice.

                      Comment


                      • #12
                        Eric Akshinthala wrote: View Post
                        The simple solution to this is compare 'variance' as per positions. For example, compare a PF to a PF, a SF to a SF and so on. Even though 'variance' provides a more accurate indication of a players consistency, it's OBVIOUSLY not been considered important enough to be used. Nothing to prove this but in this day and age, it's highly unlikely that the idea has not been explored.
                        That only solves a small part of the problems. I could add tons of other examples about context influencing a players production, especially if we focus on points.

                        What do you mean by a "more accurate indication". More accurate than what? And why would it be accurate at all? Too much noise.

                        Comment


                        • #13
                          Soft Euro wrote: View Post
                          What do you mean by a "more accurate indication". More accurate than what? And why would it be accurate at all? Too much noise.
                          While other factors are considered, the most common stats that determine a players offensive consistency are PPG and FG%. As Mars who started this thread states as example, if player A shoots 40% one night and 50% another as opposed to player B who shoots 45% on both nights, player B is more consistent, This can be determined when 'variance' is considered. Hence a more accurate indication of consistency.
                          Attitude Is A Choice.

                          Comment


                          • #14
                            Another interesting stat to look at would be on/off +/-. If there is a large variation in that from night to night, it shows how that players inconsistency is affecting the team as a whole. Even for someone like DD, who currently has a higher off court +/- than on court, the game by game analysis would at least show when he is doing worse and better, not necessarily good and bad since you could argue its always bad when the +/- favors you being off the court. If DD has lots of nights with a +6 on court, followed by lots of nights with a -5 on court, then you know his inconsistency is causing the team to lose games, and they'd be better with a player who consistently gets +1 On court every night, or even +0 On court every night, as those -5 On court games probably equal losses. Maybe every player is wildly inconsistent in those stats, I'm not sure. That's why the analysis would be cool to see.

                            Comment


                            • #15
                              Eric Akshinthala wrote: View Post
                              While other factors are considered, the most common stats that determine a players offensive consistency are PPG and FG%. As Mars who started this thread states as example, if player A shoots 40% one night and 50% another as opposed to player B who shoots 45% on both nights, player B is more consistent, This can be determined when 'variance' is considered. Hence a more accurate indication of consistency.
                              Ok, I get it, but I don't particularly agree that you'll find out who's more consistent that way. It would be nice if someone could give an example of just a comparison between two players.

                              Comment

                              Working...
                              X