I love working with graduate students because I almost always learn something while I’m helping them. One of the most common topics that students come to me with is regression analysis. You know, that scary statistical technique for finding the equation of a line. Well, it’s not all that scary and there’s lots of fun things you can do with it. But students always have the sense that there is something more to it. Believe me there’s not. If you’ve taken an even moderately rigorous course in graduate level applied stats you know regression. Let’s get into this. This came up in a discussion recently with one of my students. First a little background. Years ago, I interviewed for a job with an elite research institute. I didn’t know it was elite when I interviewed because if I had known that I would have had the terrible jitters. The man who interviewed me was a world-class applied statistician who had a reputation for striking fear into the hearts of even the executive director of the institute. I didn’t know that either at the time I interviewed. He asked me, “What do you think of regression?” What an odd question. My answer was equally odd. I said, “It’s a good excuse for bad research.” And by the way, I got the job.. My answer was the right one then and it is still the right one now. So, what is regression anyway. It is simply a method for finding the equation of a line through a given set of points. We all remember (or maybe you have repressed it) the equation of a line from high school algebra: y = ax+ b. Well, this is great. If we know the coordinates of any two points on a plane, we can determine the equation of the line between those two points. But what happens when we have a gazzilion points that don’t all lie on the same line such as in the figure below. Regression allows us to find the straight line of best fit for all of those data points. That is all there is to it. It uses a criterion called least squares which ensures that the sum of the distance between the given line and all the data points is the smallest possible. This means that the line is as close as possible to all the data points. Why is it called least squares? Because before we add all of the distances between the line and points in the set, we square them so that when we add them up, they don’t sum to zero. Multiple regression is just an extension of simple regression to more than one independent variable. Why then is regression a good excuse for bad research? To do a proper analysis with regression you develop an a priori hypothesis that certain independent variables will have a significant relationship to the dependent variable. Those variables get included in your analysis. What happens all too often in practice is that researchers will collect a whole bunch of data and try different combinations of one or ones that give them the best results. This is a big no, no and can lead you down the primrose path. Here’s an example. Shortly after taking the job at the research institute mentioned above, we were asked to take a look at a test that the institute had developed to help screen job applicants. The client said it wasn’t working. Sure enough, when we analyzed the new data there was no relationship between the predictors (scores on the test) and subsequent job performance. When we investigated a little further, we learned that the team that had done the original research analyzed the heck out of the data until they found something that seemed to work. The client had spent a lot of money on developing this test and they felt they needed to find something that worked. They had inadvertently handed the client an empty bag! The problem of course with analyzing the heck out of a set of data is that if you have enough time and enough variables you are bound to find some significant results. That’s what we call run on alpha! Don’t do it. You will get significant differences or relationships 5% of the time if you set alpha at .05. What did my student learn? What did I learn while working with her? Tune in next month and you’ll find out.
0 Comments
Leave a Reply. |
AuthorEd Siegel Archives
May 2023
Categories |