For Nerd Eyes Only: Causality and controls in multiple regressions

Ezra Klein provides some really smart thoughts about statistical controls. He generously says that researchers already know the points he raises, but I’m not sure that’s typically true. The general topic is one I’ve had to think about a lot in studying politics and religion — where there are big problems with multiple regressions with questionable variables — so let me add some additional points.

The problem Klein wrestles with is one we might call Masking Causes with Midstream Effects. Klein describes, for example, how the race of a driver might influence the reasons a driver is pulled over by police, which might in turn influence the likelihood of being arrested. A researcher might run a multiple regression predicting likelihood of arrest following a traffic stop and include among the simultaneous predictors race, the reason for the traffic stop, and lots of other variables.

The Masking Causes with Midstream Effects problem comes when the researcher says something like: “Prior studies showed strong effects of race on arrest likelihood, but we show that race isn’t really a major influence when proper controls are included.” In fact, the study shows no such thing. If race is strongly influencing reasons for traffic stops, and reasons for traffic stops are strongly influencing arrest likelihood, then in fact race remains a strong cause of arrest likelihood despite the statistical controls. As Klein rightly points out, such a study really shows (some of) the (indirect) ways in which race influences arrest likelihood. It’s not that race is unimportant, but that race enters the causal chain early and has effects on lots of variables in the model.

In theory, there are solutions to the problem. For example, the researcher might expand the model into something like a path analysis, using discrete steps to show how variables really early in the causal chain (like race) affect variables in the middle of the causal chain (like reasons for the traffic stop), and then how both the early and midstream variables jointly affect the study variable (here, arrest likelihood). The researcher then needs to be careful in describing the results, making it clear that the study shows both direct and indirect effects of variables early in the chain rather than muted or null effects of variables early in the chain.

But wait, it gets so much worse

A study of arrests from traffic stops provides an unusually clean situation. We know that there are some facts that existed long before the traffic stop (like the driver’s race, socioeconomic status, and so on). And then the event itself unfolds more-or-less linearly over time — first the person is pulled over, then there are early interactions between the driver and the officer, the vehicle may be searched or not, then various things are found or aren’t, and an arrest happens or doesn’t. Constructing a path analysis here is pretty straightforward: The background characteristics come first, followed by the early driver-officer interactions, followed by subsequent events, leading ultimately to the decision to arrest or not.

Yet many things researchers study don’t provide as clear a case for making plausible guesses about causal priority. In political studies predicting public opinion on issues, for example, researchers often run multiple regressions simultaneously using background demographics (race, income, gender, religion, and so on) along with things like liberal-conservative self-placement, political party preference, and various kinds of “values” or “moral foundations” or political “personality” variables.

While most of the demographic items are plausibly early in the causal chain, placing ideology and party and values in the causal chain before issue opinions is largely speculative. There are strong arguments that things like ideologies and party affiliations, for example, are in fact often effects rather than causes of issue opinions. With “values” and the like, it gets worse still — often these items are virtually indistinguishable from the political opinions they’re meant to “explain” (something Kurzban and I called DERP Syndrome).

Here, multiple regressions can become extremely dicey. The problem, in short, is that multiple regressions can’t tell causes from effects. Multiple regressions are perfectly happy to report that a big effect of Variable X is really a big cause of Variable X, or that a variable that shares many of the same background causes with Variable X (even though it doesn’t itself directly or indirectly cause Variable X) is really a big cause of Variable X. Further, when big effects and/or causal siblings of a predicted variable are used as predictors in a multiple regression, the model is perfectly happy to report that actual causes aren’t really big causes, or that utterly unrelated variables are actually modest causes. (You can find an extended example on pages 227-235 of my recent book.)

So the Masking Causes with Midstream Effects problem is just the tip of the iceberg. There’s also the Mistaking Effects of X as Causes of X problem, the Mistaking Siblings of X as Causes of X problem, the Masking Causes of X with Effects and Siblings of X problem, the Mistaking Unrelated Variables as Causes of X Because of Their Relationships with Mistakenly Included Effects and Siblings of X problem, and so on.

Back when I was working on my dissertation and began recognizing these problems, I wrote: “Regressions are dangerous toys, and we are all clumsy children.” And the longer I work at this stuff, the more terrifying it becomes.

Take the simple case of the traffic stop. Imagine, for example, that there’s an unmeasured variable — call it Officer Racism — that is brought into play by the race of the driver and then has big effects on the likelihood of pulling over the person for Reason X as well as the likelihood of searching and arresting the person. That might mean that the correlation between being pulled over for Reason X and the arrest is itself not actually (mostly) a causal relationship, but the two variables might be largely “siblings” of each other. Still, a multiple regression might be happy to report a big causal arrow from Reason X to the arrest, and might be happy to deflate the importance of driver’s race along the way.

So what’s to be done?

Some researchers think these are reasons to run experiments rather than use correlational data. But experiments have their own problems. Primarily, we’re often interested in things that can’t really be investigated with experiments. Some of these are obvious — we can’t, for example, randomly assign different races or religions to people and see how things turn out. Some are less obvious — many of the things we care about are complex and enduring dispositions that take years for people to form (say, people’s religiosity or political views); studying how these things change a bit after spending a few minutes in a lab doesn’t necessarily tell us anything interesting about how these things reach their more stable baselines over time. There are plenty of unnoticed and not-really-defensible assumptions in much of the experimental work that equates short-term changes from baselines with the long-term causes of those baselines.

My own response has been to go ahead and use correlational data, but to be much more explicit and suspicious about causal assumptions. Regressions are mostly OK as tools, so long as we know what they can and can’t tell us. And, crucially, a single regression can’t tell us when we’ve made mistaken casual assumptions. We have to do extra work to gather clues about that through mediation patterns and so on. We have to occasionally limit ourselves to a restricted predictor batch containing items more plainly defensible as early in the causal chain, and then only move beyond that batch in tentative steps. We have to be especially careful about saying “X doesn’t matter” when it’s something that’s plainly early in the causal chain and appears to do good work when not flooded with questionable controls.

These points are on display in my recent work on politics. We leave out of our regression models lots of predictors that are the central stars of others’ work, because we suspect those stars are largely effects and siblings and only occasionally causal. We focus mostly on demographic items we think we can defend as early in the causal chain.

And, in fact, we come to a different set of conclusions from many political scientists — not only because we have a different view of how minds work, but also because we have a different view about the need to be cautious with regressions. Rather than assuming that the biggest correlations are obviously the causes that matter most, we kept our focus on the effects of variables that are most defensibly viewed as causal.

We can’t just throw a bunch of variables at something and think a multiple regression will sort it all out for us. We have to go slower and deeper, make explicit our causal assumptions, make explicit our uncertainty about those assumptions, and engage in the trade-offs and messy descriptions that uncertainty requires.