Correlation or Regression — Which Test Should I Use?
Correlation and regression are both statistics that examine relationships among variables, but they have different applications and are often misused.
No doubt most people have seen graphs with scattered data points bordered by a vertical measurement scale (conventionally called y-axis) and a horizontal measurement scale (x-axis). Each point represents an individual item in which measurements of the two variables are indicated in the scatter diagram.
The paired variables may be sales of a product and advertising expenditure, or height and weight of children. So if you measure the height and weight of each child in a classroom and plot the data on a scattergram, you will have a visual description of the relationship of the two variables.
Correlation Analysis
A scattergram also allows you to predict the values of one variable (e.g. height) from information on the other variable (e.g. weight). But how confident you are in making that prediction is dependent on the strength of the association between the variables. Correlation analysis is a statistics used to measure this strength.
Only when two variables are found to associate (or correlate) statistically that the information is deemed useful in making a prediction. But it is important not to assume that values of one variable (e.g. body height) are caused by variations in the other variable (e.g. body weight).
Regression Analysis
There are times, however, that it is reasonable to suggest that one variable might be responsible in causing the other variable to change. For example, exercise increases our heart rate, but not the other way round. Therefore heart-rate change is dependent on exercise (e.g. jogging). In this example, heart rate is appropriately termed the dependent variable and jogging is the independent variable.
When dependent and independent variables are involved, the degree of association between variables is determined by regression analysis. On a scattergram, the dependent variable (measured as number of heartbeats per minute) is represented by the y-axis and the independent variable (distance jogged) by the x-axis. If the relationship is found by regression analysis to be significant, then a line (or slope) representing the average spread of data is drawn through the scattered points to show the relationship.
Main Difference Between the Two Types of Statistics
In correlation, by contrast, either variable can be on the y- or the x-axis because the analysis does not assume one variable is dependent on the other. This is the main difference between the regression and correlation: the former involves dependent and independent variables (or cause and effect), and the latter does not.
It is sometimes difficult to determine whether values of one variable are dependent upon the values of another variable. For example, it may be expected that sales on a product are dependent on how much advertising is done. But then again, the amount of advertising a company allocates may depend on the sales revenue. If in doubt, always use correlation rather than regression; which variable(s) are dependent and which are independent must be obvious or justified before using regression.
In the heart rate/jogging example, there is little doubt that jogging affects heart rate, and regression analysis may be used to determine how strong this effect is (as opposed to correlation analysis which determines how strongly two variables vary together).
Various Regression and Correlation Tests
Although linear regression and correlation (straight-line description of the average relationship) involving two variables are probably the most common statistics used, regression and correlation can also be curvilinear (curved) and involve three or more variables.
It is not difficult to choose the right test to use for your data, as long as you understand the applications of regression and correlation. The various types of statistics (multiple regression, Spearman rank correlation, etc.) on correlation and regression are widely described in the literature; below are some excellent examples:
Armitage P, Berry G & Matthews JNS Statistical Methods in Medical Research. 4th Ed. Blackwell Science.
Edwards AL An Introduction to Linear Regression and Correlation. Freeman.
Miles J & Shevlin M Applying Regression and Correlation: A Guide for Students and Researchers. Sage Publications.
Copyright Ken Chan. Contact the author to obtain permission for republication.
