A Brief Overview of Causal Inference
We've all been taught that correlation does not imply causation. This is why we use randomized controlled trials, when possible, to clearly test whether some factor (such as salt) affects some other variable (such as blood pressure). If we just look at salt consumption outside of a controlled study, we may find an association between salt intake and blood pressure, but that association could be due to another factor that raises both salt and blood pressure. People who eat a lot of fast food will get a lot of salt and a lot of calories. It may be that the weight gain actually causes the high blood pressure, not the salt. Unfortunately, we cannot always easily conduct controlled trials to look for causal effects. Imagine trying to do a controlled trial on the long-term effects of smoking on lung cancer. To do this, you would have to randomly force half the subjects to smoke a specific number of cigarettes per day and force another half to not smoke. This would be unethical and probably difficult to enforce. Even with salt, some are pushing for a controlled trial using volunteers from prisons, where the food could be precisely controlled for years.
While randomized control trials will continue to be the gold standard for testing causal effects, the last several decades has seen considerable breakthroughs in inferring causal effects from observational data, such as that found in electronic health records and public health surveillance systems. Use of this data to develop possible causal hypotheses, or estimate causal effects is an essential part of a learning healthcare system.
When I started to learn about causal inference, I found the literature very confusing, even though there are many well-written papers and books on the subject. It was as if different blind men were each describing part of an elephant. When I get that confused, I find it best to make a lecture to teach someone else the topic. The first draft of that presentation is now available on Github Pages. The presentation is dynamically generated using R Markdown and the excellent presentation package
xaringan. All of the code is available through the Github repo.
You can view the slides here: https://tjohnson250.github.io/overview_causal_inference/overview_causal_inference.html#1
And view the code at this repo: https://github.com/tjohnson250/overview_causal_inference
To support this post please upvote, follow @toddrjohnson, and consider using one of my referral links below:
Honeyminer: Start mining bitcoin in 1 minute