This past week it was hard to miss reading about the “air rage” study by Katherine DeCelles and Michael Norton, published in *PNAS*. They argued that economy-class passengers exposed to status inequalities when the flight had a first-class section were more likely to become belligerent or drunk — especially when the airplane boarded from the front and they had to walk right by those first-class 1%-ers in their sprawling leather seats. (article; sample news report).

Sounds compelling. But Andrew Gelman had doubts, based on some anomalies in the numbers and the general idea that it is easy to pick and choose analyses in an archival study. Indeed, for much research, one can level the critique that researchers are taking advantage of the “garden of forking paths”, but it’s harder to assess how much the analytic choices made could actually have influenced the conclusions.

This is where I found something interesting. In research that depends on multiple regression analysis, it’s important to compare zero-order and regression results. First, to see whether the basic effect is strong enough, so that the choices made for which variables to control for wouldn’t impact the results too much. Second, to see whether the conclusions being drawn from the regression line up with the zero-order interpretation of the effect that someone not familiar with regression would likely draw.

It’s already been remarked that the zero-order stats for the walk-by effect actually show a negative correlation with air rage; the positive relationship comes about only when you control for a host of factors. But the zero-order stats for the basic first-class air rage effect on economy are harder to get clear in the report. In the text, they are reported in raw, absolute numbers: over ten times as many air rage incidents occur on a yes/no basis in economy class when first class is present compared to when it is not. However, these raw numbers are naturally confounded by two factors that are highly correlated with the presence of a first class (see table S1 in the supplementary materials): length of flight and number of seats in economy. So you have to control for that in some way.

There was no data on actual number of passengers, but number of seats was used as a proxy for number of passengers, given that over 90% of flights are typically fully sold on that airline. In the article, number of seats and flight time are controlled for together, entered in the paper’s regression analysis on raw incident numbers per-flight. Not surprisingly, seats and time are correlated highly, nearly .80 (bigger planes fly longer routes), so including one after the other will not improve prediction much.

But most importantly, this approach doesn’t reflect that the effect of time and passenger numbers on behavioral base rates is **multiplicative**. That is, the raw amount of any kind of incident on a plane is a function of the number of people **times** the length of the flight, before any other factors like first class presence come into play. So what you need to do to model the rate of occurrence is introduce an **interaction term** multiplying those two numbers – not just entering the two as predictors.

Or to put it another way – the analysis they reported might give a much lower effect of first class presence if they had taken as their outcome variable the proportion of air rage incidents per seat, per flight hour, because the outcome is still confounded with the size/length of the flight if you just control for the two of them in parallel.

Still confused? OK, one more time, with bunnies and square fields.

Although the data are proprietary and confidential, I would really appreciate seeing an analysis either controlling for the interaction of time and seats, or dividing the incidents by the product of time and number of seats *before* they go in as DVs, to arrive at the reasonable outcome of incidents per person per hour.

One other thing. Beyond their mere effects on base rates, long hours and many passengers may each affect raw numbers of air rage in a more exponential way – each with a quadratic function – by also leading to higher rates of incidents, even per person per hour, through higher numbers of potential interactions, and flight fatigue. So ideally, if you’re keeping raw numbers as the DV, the **quadratic** as well as linear functions of passenger numbers and flight time need to be modeled.

So the bottom line here for me is not that the garden of forking paths is in play, but that the wrong path appears to have been taken. I look forward to any feedback on this – especially from the authors involved.

The huge suppressor effects (a zero-order correlation of .025 becoming an OR of 3.8) seem to be what is driving most of the headline results. However, it is perhaps not surprising that an article with one set of statistical errors should turn out to have another set. The authors are somewhere on the don’t know–don’t care continuum, but will presumably be getting $5,000 a day to advise the airlines how to avoid air rage very shortly.

Well, I wouldn’t take that .025 too seriously as an effect size, given the extremely low incidence of air rage; applying Pearson correlation to a low-incidence binary where tetrachoric is more appropriate can underestimate ES by a factor of 10 or more (see Ferguson’s critique of Rosenthal’s medical-study examples, http://www.christopherjferguson.com/effectsRoGP.pdf). But that also raises the question of whether standard logistic regression is the most appropriate tool – as I understand it, alternatives would tend to increase the magnitude of the odds ratio, too. What prompted my initial investigation was a desire to have two figures for zero-order and regression results that could be procedurally compared, precisely to resolve this kind of doubt.

Nick:

“somewhere on the don’t know–don’t care continuum”–well-put!