Reductionist approach to nutrition: Where the birthday paradox meets p-values.steemCreated with Sketch.

in #stats7 years ago (edited)

Birthday Paradox

If there are two people in a room, the probability of them having the same birthday is about 0.27%. Put 10 people in that same room and the probability of having two people in the room with the exact same birthday goes up to about 11.6%. Twenty-three people and the probability of two people with the same birthday has already gone up to 50%. To many people, this little probabilities phenomenon, known as the birthday paradox, goes against their intuition about probabilities.

To understand the inner workings of the probability math, we need to look at the probability of people not having the same birthday. The probability of at least two people from the group having the same birthday is one minus the probability of none of the people having the same birthday.

The probability of none of the people in the room having the same birthday is where things grow interesting. Let look at the situation with three people in the room. Let's call them Alice, Bob, and Carol. Alice can't have the same birthday as Bob. She also can't have the same birthday as Carol. Next to this, Bob and Carol can't have the same birthday. You could say that there are three links that have to check out. The probability that no one has the same birthday is the probability that two random people don't have the same birthday to the power of three. For four people this would be a power of six. For five people, a power of ten. Always one minus the probability of having the same birthday to a power equal to half the sum of N and N-1, where N is the number of people in the room.

p-values

When trying to gauge if an association could be due to pure chance, p-values can be a helpful tool. For more info on p values and their base pitfalls read my earlier post on the subject. Basically. p-values don't give the probability that the the results are spurious, instead, they give the probability that given a purely random effect that a result would occur of at least the observed magnitude.
This idea is all fine and dandy, even in nutrition if we look at big pictures only. The problem arises however when we start zooming into the details. Want to look if there is a statistically significant association between basic (protein/fat/carbohydrate) macro split and death from all medical causes? Great, run the trail and look at the p-values. You can even look at purely observational epi data, as long as we don't zoom in, everything is cool.

Combining the two

So what happens when we apply a reductionist approach to nutrition, for example when after looking at fat, carbohydrates, and protein, we start zooming into things more closely. Animal protein vs plant protein. Or what about individual amino acids. Saturated fats vs unsaturated fats. Monounsaturated versus polyunsaturated. Omega six fatty acids versus omega three.
In the end, we end up with quite a list of macros. Oh, and we forgot about vitamins, dietary fiber, minerals, etc etc.

Oh and then there is mortality. We can split up medical cause mortality based on diseases. Many types of cancer. Many types of cardiovascular disease, renal disease, etc, etc.

So what happens if we find an association between let's say linoleic acid and bone cancer? Especially if we find such an association in studies that weren't looking for them in a completely targeted way? How do we interpret the p-values if we find such an association?

Well, remember our birthday paradox? In the birthday paradox scenario, we had N(N-1)/2 as power to work with, making the probabilities go up. Now, not all associations are fair game, as the association between vitamin C intake and leucine intake won't ever be considered relevant, assuming a close to equal amount of input and endpoint variables, a worst case of N²/4 as power would be something to take into serious account.

P values under the birthday paradox scenario

While the real problem is slightly less cathestofic as the birthday paradox scenario would imply. the birthday paradox scenario does give us an easy to comprehend setting for showing an impact on the usefulness of p-values.

Remember our birthday example with the 0.27% probability or p=0.0027 for any two persons to have the same birthday. A commonly used threshold for p-values of associations to be called significant statistically is 0.05.
Now if we have a data set consisting of ten completely independent variables and we look at associations between all possible combinations of two variables out of these ten. What would be the probability of finding at least one statistically significant finding?

The answer to that question is: a whopping 90%!

Inverting the question

So how about looking at it from the other side. We know we have ten variables and we know we will look at all possible pairs. What per pair p-value would we need to accept to keep the p-value for the composite trial as a whole at a significant p<0.05?
That is, what threshold do we need to set in order to be sure that the probability of getting any spuriously significant results in our whole birthday paradox scenario?

Well, here are some numbers for different Ns:

  1. X
  2. 0.05
  3. 0.016
  4. 0.0085
  5. 0.0051
  6. 0.0034
  7. 0.0024
  8. 0.0017
  9. 0.0013
  10. 0.0011

As stated, the problem isn't this big for real variables that distinguish between input variables and endpoint variables, but just consider N=10 above to constitute ten diseases being looked at versus nine different macro nutrients. Everyone who has ever looked at for example huge epi data sets will realize that the number of potential input and output vars are normally much larger than nine versus ten, but more importantly that the type of p-values needed to yield the useful guarantees for the variables network as a whole are not the kind of p-values commonly obtained in nutritional studies of whatever kind. That is, the required p-values for overall significance of the network as a whole quickly regress to the type of values observational studies haven't seen since smoking was linked to lung cancer.

Again, observational studies have great value, even though many people would want to dispute that, they are important, especially to dispute spurious findings found in other higher order types of evidence, and even apart from that, they can be of use for getting the big picture. It is just that for observational studies, the effect of birthday paradox alike scenario's is such an obvious source of spurious nonsense that observational studies have borne the brunt of the critique regarding such spuriousness.

It can be argued though that not just observational studies but the very concept of a reductionist approach to nutrition and health creates a system of probabilities that demands higher standards and thus much lower threshold values for p-values than can reasonably be expected from nutritional associations.

I hope this blog post conveys a tiny bit of why I take issues with most of the (social) media coverage we are currently seeing surrounding the PURE study.

Sort:  

oh my , statistics! found you via MAP , good luck (-:

Hello, MAP20 has started! please go to "Six of the Best" MAP20 Minnow Contest [Vote Now - Win Upvotes]. Please look at the suggestions for all participants, especially creating a comment showcasing your best recent work. Good luck!

And don't forget, you can get further inspiration and assistance at the MAP Members Only Discord chatroom.

Love it! I take it you don't like the simplistic message of most social media posts :)

Hi, @pibara

Thank you for inviting me to this page.

I am your friendly incorruptible croupier.
I am here to help with the impartial drawing a winner from the following contestants:\ :

The block number of the current head of the block chain is 15585872.
After block 15585950 is added to the blockchain and becomes irreversable, I will anounce the winner using the witness signature from that block.

OK, I have drawn the winner.

The winner of the draw is @clumsysilverdad .

This winner has been drawn using block 15585950.

Hi, @pibara

Thank you for inviting me to your page.

I am your friendly incorruptible croupier.
I am sorry for the inconvenience, but 'dice' is not a command I currently understand.

Hi, @pibara

Thank you for inviting me to this page.

I am your friendly incorruptible croupier.
I am sorry for the inconvenience, but 'dice' is not a command I currently understand.

[@croupierbot catch-up instance]

Coin Marketplace

STEEM 0.27
TRX 0.11
JST 0.031
BTC 67411.34
ETH 3684.87
USDT 1.00
SBD 3.75