Here’s my individual answer to another question from the SPSP forum we never got around to answering.
This is referring to Tetlock’s work on identifying “superpredictors”and more generally improving performance within geopolitical prediction markets. In those studies, the target outcome is clear and binary: Will the Republicans or Democrats control the Senate after the next election? Will there be at least 10 deaths in warfare in the South China Sea in the next year? Here, Brent suggests that editorial decisions can be treated like predictions of a paper’s future citation count, which in turn feeds into most metrics that look at a journal’s impact or importance.
Indeed, prediction markets have been used as academic quality judgments in a number of areas: for example, the ponderous research quality exercises that we in the UK are subject to, or the Reproducibility Project: Psychology (I was one of the predictors in that one, though apparently not a superpredictor, because I only won 60 of the 100 available bucks). But the more relevant aspect of Tetlock’s research is the identification of what makes a superpredictor super. In a 2015 Perspectives article, the group lists a number of factors identified through research. Some of them are obvious at least in hindsight, like high cognitive ability and motivation. Others seem quite specific to the task of predicting geopolitical events, like unbiased counterfactual thinking.
There’s a reason, though, to be skeptical of maximizing the citation count of articles in a journal. [Edit: example not valid any more, thanks Mickey Inzlicht for pointing this out on Facebook!] If I had to guess, subjective journal prestige would probably be predicted best by a function that positively weights citation count and negatively weights topic generality. That is, more general outlets like Psych Science have more potential people who would cite them, independently of prestige within a field.
More fundamentally, trying to game citation metrics directly might be bad overall for scientific reporting. Admittedly, there is very little systematic research into what makes an article highly cited, especially within the kind of articles that any one journal might publish (for example, I’d expect theory/review papers to have a higher count than original research papers). But in trying to second-guess what kind of papers might drive up impact ratings, there is the danger of:
- Overrating papers that strive for novelty in defining a paradigm, as opposed to doing important work to validate or extend a theory, including replication.
- Overrating bold statements that are likely to be cited negatively (“What? They say that social cognition doesn’t exist? Ridiculous!”)
- Even more cynically, trying to get authors to cite internally within a journal or institution to drive up metrics. From what I have seen in a few different contexts, moves like this tend to be made with embarrassment and met with resistance.
- Ignoring other measures of relevance beyond academic citations, like media coverage (and how to tell quality from quantity here? That’s a whole other post I’ve got in me.)
So really, any attempt to systematically improve the editorial process would really have to grapple with a very complicated success metric whose full outcome may not be clear for years or decades. Given this, I’d rather focus on standards, and trust that they will be rewarded in the metrics over the long term.
But one last thing: It’s hard to ignore that methods papers, if directly relevant to research, seem to have a distinct advantage in citations. For example, in the top 10 JESP citations, three have to do with methods, a rate far higher than the overall percentage of methods papers in the journal. In Nature‘s top 100 cited papers across all sciences , the six psychology/psychiatry articles that make the cut all have to do with methods – either statistics, or measurement development for commonly understood constructs such as handedness or depression. (Eagle eyes should notice that a lot of the rest are methods development in biology.) So, although I had other reasons for calling for more researcher-ready methods papers in my JESP editorial, I have to say that such useful content in a journal isn’t so bad for the citation count, either.
Roger, thanks for responding to my question. I have to admit, it was posed largely to expose some of the uncomfortable issues that we typically don’t confront in the review process for which I don’t think we have any simple or satisfying answers. For example, I agree entirely with your read of the weakness of using citation metrics as an outcome of choice. Of course, we use no metric now to judge the review process and arguably the current unexamined system has resulted in papers that overemphasize 1) novelty, 2) extreme positions, 3) citing the journal (usually indirectly by citing the editors….), and 4) shooting for click-bait topics. I think if we discuss just what we want to be prioritizing we can’t help but acknowledge that an unexamined system does not necessarily prioritize what we want either. If, in turn, we don’t like any of the potential outcomes we might use to “train” a team of editors and reviewers on, then we have a problem. If we can’t articulate a set of goals for rejecting 80% of the articles we see, then how can we justify continuing to use a system that has no identifiable, and thus valid goal?
I also resonate with your suggestion that we use standards to guide the decision–we should prioritize well conducted research. The problem with using methodological quality as a decision tool is that we would, by consequence, accept many, many more papers now than we currently do at our top journals. It is actually much easier to conduct a study well then it is to come up with a novel topic that is click-bait worthy. I could see a time in the future when people get good at running high-powered studies that replicate, for example. Would it not follow that all of these studies should be published? What distinguishing characteristic would we use to confer “high quality” status on the papers published at Psych Science or JPSP for example? It would seem we would circle back to what the editors and reviewers thought was interesting, novel, exciting, etc. Thus, we would still need to confront how we define “quality”. I’m not saying it is easy, just that we should deal with it openly rather than assuming the current system is well designed to identify high quality research without over defining “quality” in a way that can be tested.
I see a couple of options that differ from the current system, all of which have been put forward in some form or another. My personal favorite would be the following:
Instead of pre-screening the articles, go to a system in which everything is published that meets some minimum methodological and reporting standards (ArXive Psychology anyone?). These “publications” could be subject to post-publication review. Then, change the role of editors to curators rather than pre-screeners. As curators, they could invite people to review the papers in order to make them better–thus the role of reviewer would change from gatekeeper to “enhancer.” The editor-as-curator would be put in the position of identifying the papers that they think members of their guild should read, without prejudice against those that were not “anointed.” This way the entire team of editors could simply vote on what they think are the papers that deserve to be noted. This would increase the number of people vetting each paper, which would arguably make the decision more reliable. Each journal could have its own goals and guilds to represent and in so doing, still do the job they do now–tell us what we should be reading and anointing researchers with differing levels of status. The kicker would be that the ArXive papers could garner as much attention as the readers would want to give them. Thus, a paper that was not originally deemed to be interesting to any given journal/editorial team might eventually become a citation classic and be “published” in a top journal many years down the road (A David Bowie of papers, so to speak). I still think that each journal would need to confront what it wants to prioritize and how it defines quality, but this type of system would allow a better way of evaluating just what does get anointed and what does not and therefore make for a better indicator of “success.” It would also reduce the load on reviewers, since they would not have to be re-used over and over again like they are now when we churn a paper down the hierarchy of journals we currently have.
Just a thought.
Brent
Brent’s comment goes straight to a question that has bugged me for a while: What really is the value-added by journals and reviewers? We have learned the hard way that having peer-reviewed publication in a prestigious journal is far from a guarantee a paper is worthwhile or even believable. At the same time, everybody knows horror stories of great research that failed to meet the putative importance threshold of our marquee journals. And also at the same time, I have not heard anybody, anywhere complain that they don’t have enough new research articles to read!
So if journals have a purpose, it is to help us, the consumers (as well as producers) of research allocate our time to what we really should be reading, in order to keep up on what’s going on in our fields. Brent’s solution seems to me like a “crowd-sourcing” process in which the intellectual market determines what we are guided to read. Sort of like the “most read” lists on some news sites. I can see the advantages, but wouldn’t that also advantage click bait?
Another, opposite solution is to do what publications like the New Yorker do. Empower an strong-minded editor to decide what gets published. This editor has lots of support (office staff, consultants, etc.). But in the end, the editor decides — unilaterally — what gets published.
Imagine a few such psychology journals, each with strong-minded editors, each with differing outlooks. That’s what I’d read! And I’d do what I can to get Brent appointed editor of the first one.
You are describing an overlay journal:
http://blog.scholasticahq.com/post/130145117128/introducing-discrete-analysis-an-arxiv-overlay
http://www.nature.com/news/leading-mathematician-launches-arxiv-overlay-journal-1.18351
http://scitation.aip.org/content/aip/magazine/physicstoday/news/the-dayside/meet-the-overlay-journal-a-dayside-post
There’s software to make your own, although it’s already old and a little broken: http://arxivjournal.org/rioja/
I don’t think the peer review process is broken, but the incentive structure for psychology journals to publish “surprising” (i.e., unlikely) outcomes based on small samples is upside-down. Review is good, but the criteria should favor true, not false results. Review should favor good methods, appropriate sample sizes, and should disregard the roll of the dice that determined the outcome.