Income, perceived status, and opinions on redistribution (For Nerd Eyes Only)

Here’s a cautionary tale about small samples, researcher degrees of freedom, how misleading published studies can be, and how ridiculously misleading press releases can be.

First, though, here are some background facts. The U.S. General Social Survey has long included an objective measure of family income in dollars (REALINC), a subjective measure of family income from “far below average” to “far above average” (FINRELA), and a number of policy items, including one asking about whether government “ought to reduce income differences between the rich and the poor, perhaps by raising taxes on wealthy families or by giving income assistance to the poor” or whether “government should not concern itself with reducing this income difference between the rich and the poor” (EQWLTH).

Looking at the GSS data from 2002-2012 (including those individuals with data on all three items; N = 6,837), the correlation between REALINC and FINRELA is .501 (naturally, of course, given that a big hint about whether one is richer or poorer than most people is obviously one’s actual income), the correlation between REALINC and EQWLTH is .207 (indicating that people with objectively higher incomes are more likely to oppose income redistribution), and the correlation between FINRELA and EQWLTH is .192 (indicating that people with subjectively higher incomes are more likely to oppose income redistribution).

If we predict EQWLTH simultaneously with REALINC and FINRELA, the standardized beta for REALINC is .148, the standardized beta for FINRELA is .117, and the model has an overall multiple correlation of .231. This indicates that neither REALINC nor FINRELA dominates the other in predicting EQWLTH (both predictors maintain significant coefficients that aren’t materially larger than the other), and that neither predictor is much of an improvement on the other (i.e., the multiple correlation of .231 is only marginally better than the individual correlations of the two predictors, .207 and .192).

Pretty simple so far, right? Basically, objective income and subjective income are largely redundant predictors of views on income redistribution at around a .2 correlation. And this isn’t a theory, it’s a fact about a very large and representative sample.

Bring on the small samples

In a new article in Psychological Science titled Subjective Status Shapes Political Preferences, a team of social psychologists looks at these matters. They include objective measures of socioeconomic status (SES), a subjective measure of social status (the MacArthur Ladder), and a scale combining various policy preferences relating to income redistribution.

Like many psychological studies, they use small samples. They include a power analysis assuming they’re looking for a .30 correlation, which leads them to samples in the 100 to 200 range. Really, though, we already know they’re looking for a correlation around .20, which means they needed a sample more in the 250-300 range.

They have two samples measuring objective and subjective SES, one with 135 participants and the other with 152. The second sample also included some false (randomly assigned) feedback on relative SES prior to measuring subjective SES. They report for the first sample (a) that, predicting views on income redistribution, neither income nor education carries a significant coefficient in a multiple regression that also includes liberal-conservative ideology and party affiliation and (b) that subjective SES does significantly predict redistribution views in a multiple regression with all the prior predictors. They report for the second sample (a) that both objective income and subjective SES have significant correlations with redistribution views and (b) that the false-feedback manipulation resulted in differences (without controlling for anything else), in separate analyses, in both subjective SES and redistribution views.

Now, I’m always a bit skeptical when I see similar samples analyzed in different ways. Why were we showed correlations without controls in sample 2 but not sample 1? Why were lib-con ideology and party affiliation used as controls in sample 1 but not sample 2? Why was objective income used as a control in testing subjective SES in sample 1 but not sample 2?

Thankfully, the researchers posted the data online, so I was able to find out the answers to these questions.

We were showed correlations in sample 2 but not sample 1 because the correlation between subjective SES and redistribution views was not significant in sample 1 (p = .237). In sample 1, the relationship between subjective SES and redistribution views is similarly not significant with objective income in the model (p = .161) or with both income and education in the model (.124). Add lib-con ideology to the model and subjective SES is almost significant (p = .065). And, at long last, add party identification to the model and we’re finally there (p = .011).

So this is the first hidden ball — in fact, in sample 1, the only way subjective SES significantly predicts redistribution views is when party identification is in the model. This is despite the fact that party identification otherwise plays no role in the theory or discussions in the paper. And this is despite the fact that party identification is as likely to be an effect rather than a cause of core views on income redistribution, and so should be viewed with real caution when thrown into a multiple regression that can’t tell the differences between causes and effects.

Contrast this with the reason given in the paper for including the ideology and party controls in sample 1: “Ideology and party affiliation were controlled because they tend to be associated with both income and attitudes toward redistribution, and we wished to reduce the possibility of third-variable explanations for any observed association between subjective SES and support for redistribution.” This is highly misleading, as it implies that subjective SES had a stand-alone relationship with redistribution views, and they were just making sure to rule out alternate explanations. In fact, the stand-alone relationship didn’t exist and required dubious “controls” to arise in the first place.

In sample 2, we see the opposite problem. Here, the stand-alone relationship between subjective SES and redistribution views is significant (p = .022), but not when objective income is entered into the model (p = .203) or when running the full model from sample 1 (p = .214). So this is the second hidden ball — they show the correlations but don’t run a model that controls for anything.

The issue here is that objective income was not a significant correlate of redistribution views in sample 1 (p = .906), but it was in sample 2 (p = .038).

Then there’s a further problem. They ran a false-feedback manipulation in sample 2, which in fact impacted subjective SES (p < .001) and redistribution views (p = .008). But then their theoretical claims imply a model they never test, namely, one in which the false-feedback condition affects subjective SES and subjective SES then affects redistribution views. In fact, when I run a multiple regression predicting redistribution views with both subjective SES and condition, subjective SES has a non-significant coefficient (p = .121) while condition is significant (p = .041). Add objective income to the model and now subjective SES drops to a dismal p = .645.

Again, contrast this with the claim in the paper: “Because random assignment experimentally controlled for objective SES in Study 2, we did not statistically control for this variable.” This isn’t as bad as the statement regarding why they controlled for ideology and party in study 1, but it’s still pretty bad given the context of the paper. In study 1, they made a big deal out of how objective income doesn’t really predict redistribution and doesn’t interfere with subjective income in predicting redistribution views. In study 2, they had data directly contradicting those findings and obscured rather than disclosed that fact.

So there are problematic conclusions in the study. They say: “In Study 1, feeling higher in relative status was associated with lower support for redistribution.” Well, OK, but only when controlling for party identification (a variable not at all central to their theory) and not otherwise. Further: “In Study 2, feeling higher in status caused reduced support for redistribution.” Actually, the manipulation caused differences in feelings about status and separately caused differences in support for redistribution, but the feelings about status didn’t cause the support for redistribution.

Bring on the press release

In the press release, things really get out of hand. I’m not blaming the study authors here — they probably had some input, but might not have had much ultimate control over the final version of the press release.

Start with the title of the press release: “Feeling — Not Being — Wealthy Drives Opposition to Wealth Redistribution.” This statement is supported by study 1 but immediately contradicted by study 2. Further, as I showed at the outset, when we go to a big publicly available sample (N = 6,837) rather than relying on samples in the mid-100s, the statement is plainly false insofar as it implies that subjective views of income matter, but objective measures of income don’t, in predicting redistribution views.

Then the first sentence: “People’s views on income inequality and wealth distribution may have little to do with how much money they have in the bank and a lot to do with how wealthy they feel in comparison to their friends and neighbors.” Again, this contradicts study 2 and the GSS data. Further, it sort of ignores the obvious point that people’s subjective views on their SES correlate really very strongly with their, umm, actual incomes (e.g., a correlation of .501 in the GSS sample, and correlations of .472 and .613 in the two samples in the paper the press release discusses). Do they really mean to imply that how much money people have doesn’t have a strong effect on how wealthy they feel?

And then later: “Support for redistribution wasn’t related to participants’ actual household income.” Again, true of study 1 but contradicted by study 2.

It’s not all bad news

So I’ve been pretty hard on the authors of the study and the press release. Let me say a couple of things in the study authors’ defense.

First, the study does have some really good things going on. Primarily, the experimental manipulations in studies 2, 3, and 4 are way cool. The findings suggest that artificially manipulating one’s perceptions of one’s relative social status really can impact one’s policy views on redistribution as well as one’s broader political and moral principles. That’s a nice result and something a lot more people in political science should take seriously.

Second, the deep problems with the paper are as much the fault of The System as they are of these authors. Mainly: the p < .05 standard is batshit crazy (it should really be at least p < .01), samples are routinely way too small (something related to the crazy-high p-value threshold), reviewers are often statistically naïve and/or don’t really have the time or inclination to call people on questionable analyses, and so on.

So, it’s always nice to see new papers on interesting topics. But there are lots of reasons to be dubious of broad statements in papers (and especially press releases) without really scrubbing the numbers.

Sexual disgust, moral disgust, and May-December romance

Aaron Goetz has some interesting ideas about May-December romances. Why, he wonders, do his students regularly have disgust reactions to photos of older men with younger women? Could it have something to do with proposals by DeScioli, Kurzban, me, and others relating to self-interested moral judgments?

His specific ideas are plausible. A young heterosexual man might want to discourage young women from mating with older men, so that there are more young women available for a young man like himself. An older heterosexual woman might similarly want to discourage such relationships, leaving more older men available for older women. In contrast, younger women and older men might not mind the idea of people engaging in May-December romances, because it expands their field of potential mates.

I haven’t thought deeply about the relationship between lifestyle moralizations and disgust, but I mostly agree with a paper Goetz points to – Tybur, Lieberman, Kurzban, & DeScioli’s Disgust: Evolved function and structure.

They distinguish between sexual disgust and moral disgust, which have complex relationships with moralizations. Sexual disgust isn’t primarily about moralization, but relates to individuals avoiding low-benefit/high-cost mating situations. But then sexual disgust can increase the likelihood of moralization, and bleed over into something like moral disgust. But then moral disgust can also arise from a moralization not based primarily in disgust – here, disgust reactions are an effect rather than a cause of a desire to moralize, and are used to communicate and coordinate condemnation of the moralized behavior. Moral disgust is often joined by moral anger, by a desire to confront rather than avoid the disgusting perpetrators.

On sexual matters, I’ve mostly studied conflicts between high-commitment strategies and low-commitment strategies. Here, there are complex interactions involving sexual disgust, moral disgust, and moralizations. High-commitment folks tend to want low-commitment behaviors to be more costly and difficult – sometimes through direct moralization, and sometimes through policy preferences, e.g., making birth control and abortion services harder to obtain or increasing legal sanctions for partying. High-commitment folks also express more disgust at low-commitment sex, abortion, and so on. Is this properly called “sexual” or “moral” disgust? Is one aspect of this psychologically prior to the others? I have no idea.

One can easily imagine sexual disgust playing out without (much) accompanying moralization. A couple of years ago, in a Super Bowl ad, supermodel Bar Rafaeli made out with an unattractive, overweight nerd around her own age. Many found it disgusting. If we gave a survey, I’m sure there’d be some tendency for people to “moralize” the behavior (e.g., endorsing survey items that say it was “wrong” or “shouldn’t have been allowed” or something). But this isn’t really a big moralization arena. It’s just sexual disgust based on the mating irrationality of a supermodel hooking up with an unattractive man. Having said that, I bet we could significantly reduce the disgust effect if this played out more slowly, and we were given lots of reasons why she might want to do this despite his physical features (like if it were a movie and we learned over time, prior to the kiss, that he was smart, rich, a good protector, had attractive exes, would make a good father, and so on).

So now the question is about May-December romances. One big issue: To what extent is this just plain-vanilla sexual disgust without much moralization? That is, to what extent are these just young people who aren’t at a life-history stage where they’re planning to have kids yet, and so the idea of a young woman getting together with a much older man just seems like a low-benefit/high-cost behavior in the absence of lots of other information making it seem like a better idea for the woman than it might first appear? Sure, there may be some modest degree of accompanying moralization, but no more than you’d find at anything else that triggers sexual disgust (or pathogen disgust).

If it’s mostly just ordinary sexual disgust, then young women might find it more disgusting than young men. Also, the disgust reaction in private settings would probably approximate the disgust reaction in public settings, again particularly for women.

Another big issue: If there are elements of strategic moral disgust, what are the strategic elements? Goetz suggests the young men/old women vs. old men/young women divide. I’d speculate there might be others as well. Lots of these kids might be thinking about their father abandoning their mother when they think about May-December, so maybe there’s some small difference between kids whose fathers are still married to their mothers and kids whose fathers are not.

If it’s mostly strategic moralization, then there’d probably be some effect where the disgust reactions are more powerfully felt in public rather than private settings. I also wonder whether expressing this kind of disgust could also serve other functions – e.g., young women signaling to their younger male classmates (or to their older male professors) that they’re interested in younger rather than older men. So maybe there’s also a manipulation here involving the gender and age of the person/people to whom they’re expressing the disgust judgment.

In the end, I suspect this is mostly about ordinary sexual disgust (like the supermodel commercial), played out among a group (college students) who are especially unlikely to view May-December romances as currently sensible for themselves and their siblings and friends, and not driven much by strategic moralization. But that’s just a guess.

For Nerd Eyes Only: Causality and controls in multiple regressions

Ezra Klein provides some really smart thoughts about statistical controls. He generously says that researchers already know the points he raises, but I’m not sure that’s typically true. The general topic is one I’ve had to think about a lot in studying politics and religion — where there are big problems with multiple regressions with questionable variables — so let me add some additional points.

The problem Klein wrestles with is one we might call Masking Causes with Midstream Effects. Klein describes, for example, how the race of a driver might influence the reasons a driver is pulled over by police, which might in turn influence the likelihood of being arrested. A researcher might run a multiple regression predicting likelihood of arrest following a traffic stop and include among the simultaneous predictors race, the reason for the traffic stop, and lots of other variables.

The Masking Causes with Midstream Effects problem comes when the researcher says something like: “Prior studies showed strong effects of race on arrest likelihood, but we show that race isn’t really a major influence when proper controls are included.” In fact, the study shows no such thing. If race is strongly influencing reasons for traffic stops, and reasons for traffic stops are strongly influencing arrest likelihood, then in fact race remains a strong cause of arrest likelihood despite the statistical controls. As Klein rightly points out, such a study really shows (some of) the (indirect) ways in which race influences arrest likelihood. It’s not that race is unimportant, but that race enters the causal chain early and has effects on lots of variables in the model.

In theory, there are solutions to the problem. For example, the researcher might expand the model into something like a path analysis, using discrete steps to show how variables really early in the causal chain (like race) affect variables in the middle of the causal chain (like reasons for the traffic stop), and then how both the early and midstream variables jointly affect the study variable (here, arrest likelihood). The researcher then needs to be careful in describing the results, making it clear that the study shows both direct and indirect effects of variables early in the chain rather than muted or null effects of variables early in the chain.

But wait, it gets so much worse

A study of arrests from traffic stops provides an unusually clean situation. We know that there are some facts that existed long before the traffic stop (like the driver’s race, socioeconomic status, and so on). And then the event itself unfolds more-or-less linearly over time — first the person is pulled over, then there are early interactions between the driver and the officer, the vehicle may be searched or not, then various things are found or aren’t, and an arrest happens or doesn’t. Constructing a path analysis here is pretty straightforward: The background characteristics come first, followed by the early driver-officer interactions, followed by subsequent events, leading ultimately to the decision to arrest or not.

Yet many things researchers study don’t provide as clear a case for making plausible guesses about causal priority. In political studies predicting public opinion on issues, for example, researchers often run multiple regressions simultaneously using background demographics (race, income, gender, religion, and so on) along with things like liberal-conservative self-placement, political party preference, and various kinds of “values” or “moral foundations” or political “personality” variables.

While most of the demographic items are plausibly early in the causal chain, placing ideology and party and values in the causal chain before issue opinions is largely speculative. There are strong arguments that things like ideologies and party affiliations, for example, are in fact often effects rather than causes of issue opinions. With “values” and the like, it gets worse still — often these items are virtually indistinguishable from the political opinions they’re meant to “explain” (something Kurzban and I called DERP Syndrome).

Here, multiple regressions can become extremely dicey. The problem, in short, is that multiple regressions can’t tell causes from effects. Multiple regressions are perfectly happy to report that a big effect of Variable X is really a big cause of Variable X, or that a variable that shares many of the same background causes with Variable X (even though it doesn’t itself directly or indirectly cause Variable X) is really a big cause of Variable X. Further, when big effects and/or causal siblings of a predicted variable are used as predictors in a multiple regression, the model is perfectly happy to report that actual causes aren’t really big causes, or that utterly unrelated variables are actually modest causes. (You can find an extended example on pages 227-235 of my recent book.)

So the Masking Causes with Midstream Effects problem is just the tip of the iceberg. There’s also the Mistaking Effects of X as Causes of X problem, the Mistaking Siblings of X as Causes of X problem, the Masking Causes of X with Effects and Siblings of X problem, the Mistaking Unrelated Variables as Causes of X Because of Their Relationships with Mistakenly Included Effects and Siblings of X problem, and so on.

Back when I was working on my dissertation and began recognizing these problems, I wrote: “Regressions are dangerous toys, and we are all clumsy children.” And the longer I work at this stuff, the more terrifying it becomes.

Take the simple case of the traffic stop. Imagine, for example, that there’s an unmeasured variable — call it Officer Racism — that is brought into play by the race of the driver and then has big effects on the likelihood of pulling over the person for Reason X as well as the likelihood of searching and arresting the person. That might mean that the correlation between being pulled over for Reason X and the arrest is itself not actually (mostly) a causal relationship, but the two variables might be largely “siblings” of each other. Still, a multiple regression might be happy to report a big causal arrow from Reason X to the arrest, and might be happy to deflate the importance of driver’s race along the way.

So what’s to be done?

Some researchers think these are reasons to run experiments rather than use correlational data. But experiments have their own problems. Primarily, we’re often interested in things that can’t really be investigated with experiments. Some of these are obvious — we can’t, for example, randomly assign different races or religions to people and see how things turn out. Some are less obvious — many of the things we care about are complex and enduring dispositions that take years for people to form (say, people’s religiosity or political views); studying how these things change a bit after spending a few minutes in a lab doesn’t necessarily tell us anything interesting about how these things reach their more stable baselines over time. There are plenty of unnoticed and not-really-defensible assumptions in much of the experimental work that equates short-term changes from baselines with the long-term causes of those baselines.

My own response has been to go ahead and use correlational data, but to be much more explicit and suspicious about causal assumptions. Regressions are mostly OK as tools, so long as we know what they can and can’t tell us. And, crucially, a single regression can’t tell us when we’ve made mistaken casual assumptions. We have to do extra work to gather clues about that through mediation patterns and so on. We have to occasionally limit ourselves to a restricted predictor batch containing items more plainly defensible as early in the causal chain, and then only move beyond that batch in tentative steps. We have to be especially careful about saying “X doesn’t matter” when it’s something that’s plainly early in the causal chain and appears to do good work when not flooded with questionable controls.

These points are on display in my recent work on politics. We leave out of our regression models lots of predictors that are the central stars of others’ work, because we suspect those stars are largely effects and siblings and only occasionally causal. We focus mostly on demographic items we think we can defend as early in the causal chain.

And, in fact, we come to a different set of conclusions from many political scientists — not only because we have a different view of how minds work, but also because we have a different view about the need to be cautious with regressions. Rather than assuming that the biggest correlations are obviously the causes that matter most, we kept our focus on the effects of variables that are most defensibly viewed as causal.

We can’t just throw a bunch of variables at something and think a multiple regression will sort it all out for us. We have to go slower and deeper, make explicit our causal assumptions, make explicit our uncertainty about those assumptions, and engage in the trade-offs and messy descriptions that uncertainty requires.