Bayesian Statistics: what is it about?

First of all, if you are unfamiliar with Bayes’ Law, here is a very nice video that explains both the formula AND the concept. Yes, the end of the video goes into social commentary (and makes some interesting points) but the math before it is very good.

If you’ve already familiar with those ideas, you can start here.

Let’s start with an example from sports: basketball free throws. At a certain times in a game, a player is awarded a free throw, where the player stands 15 feet away from the basket and is allowed to shoot to make a basket, which is worth 1 point. In the NBA, a player will take 2 or 3 shots; the rules are slightly different for college basketball.

Each player will have a “free throw percentage” which is the number of made shots divided by the number of attempts. For NBA players, the league average is .672 with a variance of .0074.

Now suppose you want to determine how well a player will do, given, say, a sample of the player’s data? Under classical (aka “frequentist” ) statistics, one looks at how well the player has done, calculates the percentage (p ) and then determines a confidence interval for said p : using the normal approximation to the binomial distribution, this works out to \hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{n}\sqrt{p(1-p)}

\

Yes, I know..for someone who has played a long time, one has career statistics ..so imagine one is trying to extrapolate for a new player with limited data.

That seems straightforward enough. But what if one samples the player’s shooting during an unusually good or unusually bad streak? Example: former NBA star Larry Bird once made 71 straight free throws…if that were the sample, \hat{p} = 1 with variance zero! Needless to say that trend is highly unlikely to continue.

Classical frequentist statistics doesn’t offer a way out but Bayesian Statistics does.

This is a good introduction:

But here is a simple, “rough and ready” introduction. Bayesian statistics uses not only the observed sample, but a proposed distribution for the parameter of interest (in this case, p, the probability of making a free throw). The proposed distribution is called a prior distribution or just prior. That is often labeled g(p)

Since we are dealing with what amounts to 71 Bernoulli trials where p = .672 so the distribution of each random variable describing the outcome of each individual shot has probability mass fuction p^{y_i}(1-p)^{1-y_i} where y_i = 1 for a make and y_i = 0 for a miss.

Our goal is to calculate what is known as a posterior distribution (or just posterior) which describes g after updating with the data; we’ll call that g^*(p) .

How we go about it: use the principles of joint distributions, likelihood functions and marginal distributions to calculate g^*(p|y_1, y_2...,y_n) = \frac{L(y_1, y_2, ..y_n|p)g(p)}{\int^{\infty}_{-\infty}L(y_1, y_2, ..y_n|p)g(p)dp}

The denominator “integrates out” p to turn that into a marginal; remember that the y_i are set to the observed values. In our case, all are 1 with n = 71 .

What works well is to use the beta distribution for the prior. Note: the pdf is \frac{\Gamma (a+b)}{\Gamma(a) \Gamma(b)} x^{a-1}(1-x)^{b-1} and if one uses p = x , this works very well. Now because the mean will be \mu = \frac{a}{a+b} and \sigma^2 = \frac{ab}{(a+b)^2(a+b+1)} given the required mean and variance, one can work out a, b algebraically.

Now look at the numerator which consists of the product of a likelihood function and a density function: up to constant k , if we set \sum^n_{i=1} y_i = y we get k p^{y+a-1}(1-p)^{n-y+b-1}
The denominator: same thing, but p gets integrated out and the constant k cancels; basically the denominator is what makes the fraction into a density function.

So, in effect, we have kp^{y+a-1}(1-p)^{n-y+b-1} which is just a beta distribution with new a^* =y+a, b^* =n-y + b .

So, I will spare you the calculation except to say that that the NBA prior with \mu = .672, \sigma^2 =.0074 leads to a = 19.355, b= 9.447

Now the update: a^* = 71+19.355 = 90.355, b^* = 9.447 .

What does this look like? (I used this calculator)

That is the prior. Now for the posterior:

Yes, shifted to the right..very narrow as well. The information has changed..but we avoid the absurd contention that p = 1 with a confidence interval of zero width.

We can now calculate a “credible interval” of, say, 90 percent, to see where p most likely lies: use the cumulative density function to find this out:

And note that P(p < .85) = .042, P(p < .95) = .958 \rightarrow P(.85 < p < .95) = .916 . In fact, Bird’s lifetime free throw shooting percentage is .882, which is well within this 91.6 percent credible interval, based on sampling from this one freakish streak.

Embarrassed but still trying

To the Riverplex via a 2.5 mile segment and 7.6 home. It was windy going out. On the way back a young person smiled at me and said “great job”…I think that I was losing steam. And I KNOW I looked terrible. But I smiled and said “thank you.” My yoga teacher upped the ante in class…gave us a flowing, high energy, lots of plank/side plank movements. And my knees…side block is easy..not *quite* to sitting on one flat yoga block.

Weight: 186…getting there.

How cool is this?

Yes, we’ve seen the first images of a black hole. A lot of top-notch talent worked hard to make this possible. Here is one of these people, who is destined to go down in the science history books:

 

 

And in sports: regrettably I had to miss the Bradley Baseball team beating Iowa (though I went to a wonderful lecture on digital privacy and law enforcement/trail matters. If only they were on different nights.

surprising…

I didn’t sleep all that much; kind of restless for a while.  Still, my weight room session went well.
pull ups: 15-15-10-10, incline: 10 x 135, 6 x 150, decline: 10 x 165, military: 10 x 50 standing, 15 x 50 seated, 10 x 45 standing. rows: 3 sets of 10 x 50 single arm. Usual planks, headstand, knee stretches (started with 4 lb. medicine ball),..went very well.  Goblet squats: 6 x 30,  6 x 50, 6 x 62 to the sill.

Weight: 187.

Moving right along

Just a workout catch up: Monday: weights only. rotator cuff, pull ups (5 sets of 10, one of 5), bench: 10 x 135, 3 x 185, 7 x 165, military: 8 x 50 standing, 15 x 50 seated (supported), 10 x 90 machine, rows: 3 sets of 10 x 50 each arm, goblet squats (sets of 6: free, 25, 50, 62 to the sill), planks, knee stretches, etc. Knee: start with the 4 lb. ball.

I ran to and from yoga class, taking the long way home.

I am calling this 10, as I did a couple of unintentional out and backs (on Cooper, and again in upper Glenn Oak Park).  Yeah, it was slow; 1:30 on the way back or something to that effect.  the 2.5 miles there took about 30 minutes.
I was able to knee on the yoga blocks which were stacked on their side.

 

I can’t imagine doing that

After last Friday night’s trail event (I walked, finished  10 miles muddied but feeling good with a lot left “in the tank”, I though about the 30, mile finishers, the 50, 100, and of course, the 150 and 200 (yes, there is a 200 mile option)

I thought: “I can’t even imagine.”  But here is a critical difference between my saying that and a younger newcomer saying that:  I finished the 30 several times (2003, when it was 50K, and the fall 2009, 2010, 2014, 2015 (latter is not listed due to an “early start”, but it took just under 12 hours), and I finished the 50 twice (12:46 in 2004, 31:3x in 2008…DNF’ed and then came back the next morning to “finish”, and yes, I got credit) and the 100 twice (34:16 in 2005, 47:48 in 2009).   When I DNF’ed the 100 in 2016, I was 15:2x at 50 and came back for 2 more loops the next day..and I also DNF’d the 50 a few years ago.

 

So yes, I did finish some of these longer events..or at least someone with my name and history did.  I just can’t imagine doing that with my current body. 

Ah…so onward to do what I can this morning.

 

Ten miles in the mud

 

The idea: I was to pace a friend who was planning on doing a slow 50 miles, starting at noon on Friday. The course was 5 ten mile loops in McNaughton Park, near Pekin, Illinois. She changed plans due to the nature of the course; it is a challenging course to begin with (featuring 13 uphill segments per loop; mostly 70-80 foot hills) but spring rains and a lack of drainage turned much of the course into a river of mud.

But I had signed up for the 10 mile even (Potawatomi Trail Runs) which began at 8 pm on Friday, so I went ahead an attempted the event. I wasn’t prepared to compete, so I went out and walked at a deliberate pace…when I wasn’t falling down.


My ruined shoes (they were old anyway)
Yes, even my handheld flashlight got muddy

 

The course used to be single track with breaks on grassy prairie. Now, with the exception of the initial “almost a mile” meadow loop, the grassy parts looked like narrow dirt roads which were rivers of mud. And yes, the old shoe sucking mud was there in force.
In fact, on Golf Hill (about 4 miles into it..a rope assist hill), my left shoe got sucked into the mud. My trail gaiters were also sucked in..I lost them. I feel 2-3 times (on my butt) and was off balance many more times, but my knee stretching program (part of my fitness) kept me from getting hurt.
Also, the elastic in my headband gave way so I had to use my headlamp as a hand held, and I supplemented it with the above flashlight..that actually worked out very well.

It took me 4:09 to walk this 10 mile loop (I used to finish my 100’s on this course at that pace), but I had a lot left in the tank.

Today: I slept in, did a standard weight workout (light on the bench, 1 x 185, 7 x 165, 10 x 165 incline, usual military, pull ups were good: 5 sets of 10, 1 of 5, etc) and then caught some Bradley Softball (sadly, they lost two very competitive games to Missouri State..1 run each time.)

Babs and me cheering on BU Softball