New Policies at JESP for 2018: The Why and How

As we found out starting in 2016, improving the scientific quality of articles in JESP, by requesting things such as disclosure statements or effect size reporting, means more work. More work for triaging editors, to check that the requried elements are there; and also for authors, if they get something wrong and have to revise their submission. To keep this work manageable, changes in publication practices at JESP proceed in increments. And with a new year, the next increment is here (announcement).

Briefly, from January 1 2018 we will:

  • No longer accept new FlashReports as an article format.
  • Begin accepting Registered Reports, or pre-registrations of Introduction sections and research plans prior to data collection which are sent to peer review, and are accepted, revised or rejected prior to knowing the study’s results. These may be replication or original studies. This format also includes articles where one or more studies are reported normally, and only the final study is in Registered Report format.
  • In all articles, require any hypothesis-critical inferential analyses to be accompanied by a sensitivity power analysis at 80%, that is, an estimate of the minimum effect size that the method can detect with 80% power. Sensitivity analysis is one of the options in the freely available software GPower (manual, pdf). It should be reported together with any assumptions necessary to derive power; for example, the assumed correlation between repeated measures.
  • Require that any mediation analyses either: explain the logic of the causal model that mediation assumes (for example, why the mediator is assumed to be causally prior to the outcome), or present the mediation cautiously, as a test of only one possible causal model.
  • Begin crediting the handling editor in published articles.

Ring out the Flash, ring in the Registered

It should come as no surprise that the journal is adopting Registered Reports, given my previous support of pre-registration (editorial, article). It may require more explanation why FlashReports are being discontinued.

As of January 2016 we emphasized that these short, 2500 word reports of research should not be seen as an easy way to get tentative evidence published, but rather an appropriate outlet for briefly reported papers that meet the same standards for evidence and theory development as the rest of the journal. Our reviewing and handling of FlashReports in the past two years has followed this model, and we also managed to handle FlashReports more speedily (by 3-4 weeks on average) than other article types.

With these changes, though, came doubts about the purpose of having a super-short format at all. It’s hard to escape the suspicion that short reports became popular in the 2000s out of embarrassment at the lag in publishing psychology articles on timely events. For example, after 9/11, some of the immediate follow-up research was still coming out in 2006 and later. Indeed, previous guidelines for FlashReports did refer to research on significant events.

But social psychology, unlike other disciplines (such as political science, which faces its own debates about timely versus accurate publishing), does not study historical events per se. Instead, they are examples that can illustrate deeper truths about psychological processes. Indeed, many FlashReports received in the past year dealt with the 2016 election or the Trump phenomenon, but didn’t go deep enough into those timely topics to engage with psychological theory in a generalizable way.

Few problems can be solved by a social psychologist racing to the scene. Our discipline has its impact on a longer, larger scale. We inform expert testimony, public understanding, and intervention in future cases. With this view, it becomes less important to get topical research out quickly, and more important to make sure the conclusions from it are correct, fully reported, replicable, and generalizable. While we still strive for timely handling of all articles, it makes less sense to promote a special “express lane” for short articles, and more sense to encourage all authors to report their reasoning, method, and analysis fully.

Sense and sensitivity

Over the past year, the Editors and I have noticed more articles that include a justification of sample size, even though it is not formally requested by our guidelines. This prompted us to reconsider our own requirements.

Many of us saw these sample size justifications as unsatisfactory on a number of counts. Sometimes, they were simply based on “traditional” numbers of participants (or worse yet, on the optimistic n=20 in Simmons et al., 2011, which the authors have decisively recanted). When based on a priori power analyses, the target effect size was often arrived at ad hoc, or through review of literatures known to be affected by publication bias. In research that tests a novel hypothesis, the effect size is simply not known. Attempts to benchmark using field-wide effect size estimates are unsatisfactory because methodology and strength of effects make effect sizes vary greatly from one research question to another (e.g. in Richard et al. 2003, the effect sizes of meta-analyses, themselves aggregating dozens if not hundreds of studies and methods, vary from near-zero to r = .5 and up). It is also sometimes seen that authors base their sample estimates on power analysis but fall short of those numbers in data collection, showing the futility of reporting good intentions rather than good results. In 2016, we saw these weaknesses as reasons not to make a requirement out of power analyses.

And yet it helps to have some idea of the adequacy of statistical power. Power affects the likelihood that any given p-value represents a true vs. false positive (as demonstrated interactively here.) More generally, caring about adequate power is part of the greater emphasis on method, rather than results, that I promoted in the 2016 editorial. A post-hoc power analysis, however, gives no new information above and beyond the exact p-value, which we now require; that is, a result with p = .05 always had about 50% power to detect the effect size actually found, 80% power always corresponds to p =.005 post-hoc, and so on.

As suggested by incoming Associate Editor Dan Molden, and further developed in conversations with current editors Ursula Hess and Nick Rule, the most informative kind of power analysis in a report is the sensitivity analysis, in which you report the minimum effect size your experiment had 80% power to detect. Bluntly put, if you are an author, we don’t want to make decisions based on your good intentions¬† (though you’re still welcome to report them), but rather, on the sensitivity of your actual experiment.

As before, there are no hard-and-fast guidelines on how powerful an experiment must be. The sensitivity of an experiment to reasonable effect sizes for social psychology will be taken into account, together with other indicators of methodological quality, and with some consideration of the difficuty of data collection. Our hope is that other journals will see the merit of shifting from reporting based on explaining intentions, to reporting based on the statistical facts of the experiment.

Mediation requires a causal model

The new policy on mediation is just a determination to start enforcing the warnings  already leveled at the use of this statistical practice in my 2016 editorial. To quote:

As before, we see little value in mediation models in which the mediator is conceptually very similar to either the predictor or outcome. Additionally, good mediation models should have a methodological basis for the causal assumptions in each step; for example, when the predictor is a manipulation, the mediator a self-reported mental state, and the outcome a subsequent decision or observed behavior. Designs that do not meet these assumptions can still give valuable information about potential processes through correlation, partial correlation, and regression, but should not use causal language and should interpret indirect paths with caution. We reiterate that mediation is not the only way to address issues of process in an experimental design.

In line with this, mediation analyses for JESP now have to either justify the causal direction of each step in the indirect path(s) explicitly using theory and method arguments, or include a disclaimer that only one of several possible models is tested. This standard also follows the recommendations made in a forthcoming JESP methods paper by Fiedler, Harris & Schott, which shows that mediation analyses published in our field almost never mention the causal assumptions that, statistically, these methods require. Again, it is our hope that this policy and its explanation will inspire other editors to take a second look at the use of mediation, especially in studies where the sophistication it lends is more apparent than real.