Part One: the researchers’ perspective
The field of social psychology is reeling from a series of crises that call into question the everyday scientific practices of its researchers. The fuse was lit by statistician John Ioannidis in 2005, in a review that outlined why, thanks particularly to what are now termed “questionable research practices” (QRPs), over half of all published research in social and medical sciences might be invalid. Kaboom. This shook a large swathe of science, but the fires continue to burn especially fiercely in the fields of social and personality psychology, which marshalled its response through a 2012 special issue in Perspectives on Psychological Science that brought these concerns fully out in the open, discussing replication failure, publication biases, and how to reshape incentives to improve the field. The fire flared up again in 2015 with the publication of Brian Nosek and the Open Science Collaboration’s high-profile attempt to replicate 100 studies in these fields, which succeeded in only 36 per cent of cases. Meanwhile, and to its credit, efforts to institute better safeguards like registered reports have gathered pace
社会心理学领域正在受到一系列的危机的影响，这些危机对研究者的日常科学实践提出了质疑。导火索由统计学家John Ioannidis在2005点燃，他在一篇综述中概述了为什么要特别感谢所谓的“有问题的研究行为questionable research practices”（QRPs），所有 已发表的社会科学与医学研究中可能有一半是无效的。这个大爆炸撼动了众多的科学，但是在社会与人格心理学领域，这场大火仍然特别剧烈的燃烧着， 这些担忧通过2012出版的《心理科学透视》特刊所进行的整理和回应而完全公开了，在这个特刊中，讨论了重复性的失败，出版社的偏见，以及如何重塑激励 措施，以改善这一领域。在2015年，这场大火又燃起来了，2015年布莱恩·诺赛克（Brian Nosek）的出版物以及“开放科学合作”(The Open Science Collaboration）高调尝试复制了这些领域100项研究，只有36%的案例成功重复。值得赞扬的是，与此同时，对更好的保障措施的努力建立， 例如，注册报告已经加快步伐。
So how bad did things get, and have they really improved? A new article in pre-print at the Journal of Personality and Social Psychology tries to tackle the issue from two angles: first by asking active researchers what they think of the past and present state of their field, and how they now go about conducting psychology experiments, and second by analysing features of published research to estimate the prevalence of broken practices more objectively
The paper comes from a large group of authors at the University of Illinois at Chicago under the guidance of Linda Skitka, a distinguished social psychologist who participated in the creation of the journal Social Psychological and Personality Science and who is on the editorial board of many more social psych journals, and led by Matt Motyl, a social and personality psychologist who has published with Nosek in the past, including on the issue of improving scientific practice
本文来自于伊利诺伊大学的一大批作者，由Linda Skitka指导，Linda Skitka是一个杰出的社会心理学家，他参与了《社会心理学和人格科学》杂志的创立， 并且在更多的社会心理学期刊中担当编委，在Matt Motyl的带领下，一个社会和人格心理学家曾经在过去诺斯克进行了合作出版，这些包括提高科学实践问题。
Psychology research is the air that we breathe at the Digest, making it crucial that we understand its quality. So in this two-part series, we’re going to explore the issues raised in the University of Illinois at Chicago paper, to see if we can make sense of the state of social psychology, beginning in this post with the findings from Motyl et al’s survey of approximately 1,200 social and personality psychologists, from graduate students to full professors, mainly from the US, Europe and Australasia
心理学研究是BPS研究精选中的赖以呼吸的空气，我们对其质量的了解是至关重要的。所以在这个两部分的系列中，我们将探讨在芝加哥伊利诺伊大学论文的问题，看看我们是否能够使 了解社会心理状态的状况，开从这篇文章开始，Motyl 等人调查了约1200名社会和人格心理学家，从研究生到全职教授，主要来自美国，欧洲和澳大利亚。
Motyl’s team began by asking their participants about the state of the field now as opposed to 10 years ago. On average, participants believed that older research would only replicate in 40 per cent of cases – quite close to Nosek’s figure – but they believed that research being conducted now would have a better rate, about 50 per cent, and that generally the field was improving itself in response to the crisis
Motyl’s team also canvassed the respondents on a range of questionable research practices, sketchy behaviours like neglecting to report all the measures taken, or quietly dropping experimental conditions from your study. Thanks particularly to work by Joseph Simmons, Leif Nelson, and Uri Simonsohn, we understand just how much these practices compromise the assumptions of scientific significance testing, making it easy to produce false positive results even in the absence of fraudulent intent. In their words, QRPs are not wrong “in the way it’s wrong to jaywalk”, the way that researchers have often implicitly been encouraged to think of them, but “wrong the way it’s wrong to rob a bank.”
Motyl的团队也调查了受访者的一系列有问题的研究行为，粗略的行为，包括忽视报告所有采取的措施，或者悄然无息的从研究中删除实验条件。特别感谢Joseph Simmons, Leif Nelson, 和Uri Simonsohn的工作，让我们了解了这些做法多大程度上损害了科学意义测试的假设，很容易在没有欺诈的意图产生假阳性结果。 用他们的话来说，QRP“在它错误的擅自乱窜马路的路上in the way it’s wrong to jaywalk”是没有错的，研究人员常常暗示的方式鼓励他们 去思考它们，但“错误方式本身如同抢银行。”
Previous surveys of researchers’ own QRP usage have uncovered high levels of admissions, as if the field was rushing to the confession box to purge their sins. Here, Motyl’s team used finer-grained questioning to look at frequency (often a “yes” turned out to be “rarely” or “once”) and justification. In some cases, a researcher’s justification showed that they had misinterprete the question and that they were actually expressing strong disapproval of the QRP – in fact, this seemed to be the case in virtually all “confessions” of data fabrication. In other cases, the context provided by a justification painted the particular research practice in a completely different light
之前关于研究者自己QRP习惯的调查，已经建立了更高的入场门槛，这个领域就像是在慌忙地赶去忏悔箱以清洗他们的罪恶。在这里，Motyl的团队使用更细 的质询来查看频率分布（通常是一个“是”证明“很少”或“一次”）和过失情况。在某些情况下，一个研究者的辩解显示了他们误解了问题，而且，他们实际上是在表达对QRP强烈反对——事实上，几乎所有的“忏悔供词”的数据造假 看起来都是这样的。在其他情况下，由正当理由提供的背景材料在一个完整的不寻常的光谱中绘制了特定的研究实践。
For example, consider the seemingly dodgy decision to drop conditions from your study analysis. If your rationale is that the condition didn’t turn out to do what you want to do – in an emotion and memory study, your sad video didn’t produce a sad mood in participants, for instance – it’s actually more problematic to keep what is effectively a bogus condition in your analysis than it is to exclude it (ideally in a principled way according to a registered procedure). For the new survey, independent judges evaluated all the stated justifications, and felt they legitimised the “questionable” practices in 90 per cent of cases
考虑到愚蠢的决定， 例如，从你的研究中删除条件状况。如果你的理由是，这样的前提条件的结果并不是你想要的——例如，在情绪记忆的研究中，你的悲伤的视频没有让参与者产生悲伤情绪，比如——实际上 ，在你的分析中保留那些有效的伪条件比排除这些更为困难更多的问题可以让你分析什么是有效的而不是排除它虚假的条件（根据注册程序，理论上是有个原则方法的）。新的调查显示，独立的 审鉴人对所有的陈述理由进行评估，并认为他们合法的“质疑了”90%的案例的实践。
Discovering these misunderstandings and justifiable practices littered through the QRP data led Motyl’s team to conclude that pre-explosion psychology practices aren’t as derelict as once feared, although the fact that 70 per cent respondents said they are now less likely to engage in many of these practices than ten years ago suggests that all was not entirely virtuous back then
So not perfect, but getting better, is the take within the field: a cautious optimism compared to some dire pronouncements on the state of psychology. In Part Two, we’ll look at the body of psychological research itself, to see if this optimism is justified
A new paper in the Journal of Personality and Social Psychology has taken a hard look at psychology’s crisis of replication and research quality and we’re covering its findings in two parts.
In Part One, published yesterday, we reported the views of active research psychologists on the state of their field, as surveyed by Matt Motyl and his colleagues at the University of Illinois at Chicago. Researchers reported a cautious optimism: research practices hadn’t been as bad as feared, and are in any case improving.
But is their optimism warranted? After all, several high-profile replication projects have found that, more often than not, re-running previously successful studies produces only null results. But defenders of the state of psychology argue that replications fail for many reasons, including defects in the reproduction and differences in samples, so the implications aren’t settled.
To get closer to the truth, Motyl’s team complemented their survey findings with a forensic analysis of published data, uncovering results that seem to bolster their optimistic position. In Part Two of our coverage, we look at these findings and why they’re already proving controversial.
Motyl and his colleagues used a relatively new type of analysis to assess the quality and honesty of the data found in over 500 previously published papers in social psychology. Their approach is technical, involving weirdly-named statistics conducted upon even more statistics, so it helps to use an analogy: Just as a vegetable garden produces a variety of tomatoes, some bigger than others, some misshapen, some puny and poor for eating, an honestly-conducted body of research should bear a range of fruit in the same way. True experimental effects shouldn’t always come out exactly the same: they should vary in size from experiment to experiment, including instances when the effect is too small to be statistically significant.
These are the sorts of things you can evaluate in a body of research – in this case with the Test for Insufficient Variance, which Motyl’s study used alongside six other indices. When there were too many irregularities in the data, or bizarre regularity like identikit supermarket tomatoes, this suggested to Motyl and his colleagues that questionable research practices may have been used to make the weak results swell up to reach the desired appearance.
Crucially, however, the study found that more often than not, the indices showed low levels of anomalies, suggesting research practices are more likely to be acceptable than questionable. This was the case for studies from 2003-4, before the crisis was fully acknowledged, and the researchers found an even better picture for more recent (2013-14) papers. The fruits of the research may have been tampered with from time to time, but there was no case that the entire enterprise was “rotten to the core”.
This optimistic conclusion conflicts with similar analyses performed in the past, but this might be explained by the different approaches of collecting the data – of gathering the fruit, if you will. Past approaches automatically scraped articles for every instance of a statistic, such as every listed p-value. But this is like a bulldozer ripping out a corner of a garden and measuring everything that looks anything like a tomato, including stones and severed gnome-heads. To take just one example, articles will often list p-values for manipulation checks: confirmations that an experimental condition was set up correctly (did participants agree that the violent kung-fu clip was more violent than the video of grass growing?). But these aren’t tests to determine new scientific knowledge, rather – turning to another analogy – the equivalent of a chemist checking their equipment works before running an experiment. So Motyl’s team took a more nuanced approach, reading through every article and picking out by hand only the relevant statistics.
However, all is not rosy in the garden. At their Datacolada blog, “state of science” researchers Joseph Simmons, Leif Nelson, and Uri Simonsohn, have already responded to the new analysis and they’re sceptical. Simmons and co first note the daunting scale of the new enterprise: to correctly identify 1800 relevant test statistics from 500 papers. In an online response, Motyl’s team agreed that yes, it was time consuming, and yes, it required a lot of hands: “there are reasons this paper has many authors: It really took a village,” they said.
然而，花园里的一切并不都是玫瑰色的。datacolada在他们的博客中说，“科学的状态”的研究Joseph Simmons，Leif Nelson，和Uri Simonsohn，已经回应了新的分析和他们的怀疑。西蒙斯和他的合作者首先注意到新企业的令人生畏的规模：从500份文献中更正了1800个相关的测试统计。Motyl的团队在网上的一个回应中认为，是的，这很费时间，是的，它需要很多人手：“这篇论文有很多作者的原因：它真的占用了一个村庄，”他们说。
But Datacolada sampled some of the statistics that Motyl’s team used in their assessments and they argue that far too many of them were inappropriate, including data from manipulation checks that Motyl’s group had themselves categorised as statistica non grata. To the Datacolada team, this renders the whole enterprise suspect: “We are in no position to say whether their conclusions are right or wrong. But neither are they.” In their response, Motyl’s team make some concessions, but they argue that some of the statistic selection comes down to difference of opinion, and defend both their overall procedure, and the amount of coding errors they expect their study will contain. So….
So doing high-quality science isn’t straightforward. Neither is doing high-quality science on the quality of science, nor is gathering everything together to form high-quality conclusions. But if we care about the validity of the more sexy findings in psychology – the amazing powers of power poses to make you physically more confident, how you can hack your happiness simply by changing your face, and how even subtle social signals about age, race or gender can transform how we perform at tasks – we need to care about psychological science itself, how it’s working and how it isn’t. (By the way, those findings I just listed? They’ve all struggled to replicate.)
There are surely ways to to improve the methods of this new study – perhaps not coincidentally, Datacolada’s Leif Nelson is running a similar project – but even if the new assessment does include some irrelevant statistics, it will likely be an advance on past analyses that included every irrelevant statistic.
有一定的方法来改善这一新的研究–也许不是巧合的方法，datacolada Leif Nelson运行一个类似的项目–但即使新的评估不包括一些不相关的数据，可能会对过去的分析，包括每一个无关紧要的统计提前。
So … the new insights have budged my position on the state of science a little: I’m still worried, but I can see a little more light among the dark. Motyl’s group make the case that social psychology isn’t ruined, that the garden isn’t totally contaminated. I hope so. But it’s not hope on its own that will move our field forward, but research, debate, and making sense of the evidence. After all, psychology is too good to give up on.