The scrutiny of scientific articles by other researchers – peer review – has long been a fundamental element in the quality assurance of research. The review process has, however, been subject to criticism for being subjective and inadequate. Several attempts have been made to improve the process by, for example, making it more open and by allowing comments from other, previously published articles.
Another alternative that is now being tested is the use of computer software that can automatically review parts of a study. Stat Reviewer is one such program.
Easier for a computer to detect mistakes
When Timothy Houle, assistant professor at Wake Forest School of Medicine in North Carolina, complained that he kept encountering the same statistical mistakes in the studies he reviewed, his friend and systems developer Chad deVoss reacted. He reasoned that it should be possible for recurring mistakes in presentations, which are also regulated by guidelines, to be more easily detected by a computer than by a person.
So, together with Timothy Houle, Chad deVoss – who is CEO of Next Digital Publishing in Wisconsin – developed the Stat Reviewer program. This is a program that can identify and analyse the component parts of a study. A report of the analysis is produced which identifies both the instances when statistics are handled in an incorrect way and when methodological guidelines are not followed.
The idea is that people will get better at doing this, not that computers will take over.
“We hope that, in the long term, Stat Reviewer will be able to change the culture surrounding methods and statistics, and will help authors to become better at producing robust studies. It should not be possible to use the program as a shortcut”, says Chad deVoss.
Publishers are testing the program
The publisher BioMed Central has taken an interest in the program and has been involved in its development since 2014. Beginning in autumn 2016, BioMed Central has been leading an experiment intended to investigate the possibility of automating the statistical and methodological review of research.
The experiment includes four journals: Trials, Critical Care, BMC Medicine and Arthritis Research & Therapy. Stat Reviewer is being used in parallel with the regular working method in a blinded study. The primary objective is to measure how many methodological errors are detected by Stat Reviewer compared to by a normal peer review, but will also include an evaluation of how well the computer’s proposals for improvement are received by authors and reviewers. Will they be willing to follow its advice?
All reviewed articles must go through the entire process up until the final decision, which means that the results of the study will not be known until some time in 2017. It should then be possible to introduce Stat Reviewer as a part of the review process and to proceed with its use in more journals.
“The program is currently restricted to use in randomised controlled studies, but the team behind Stat Reviewer is working to make it applicable to all empirical research. The intention of our work, however, is not to replace the reviewers. We want to support and reinforce their evaluations by checking fundamental elements”, says Daniel Shanahan, editor of BioMed Central.
Could provide valuable support
Stefan Eriksson, lecturer at Uppsala University’s Centre for Research and Bioethics, believes that an automatic initial run-through could provide reviewers with valuable support.
“More and more studies are being conducted and peer review is something that both takes a great deal of time and, in the view of many, does not garner a lot of respect. To receive assistance in the evaluation of methodological and statistical aspects sounds pretty good to me. An initial automated run-through should be able to increase the efficiency of the system”, says Stefan Eriksson.
But might it be possible for automated run-throughs to do more than check structures and calculations? Chad deVoss at Stat Reviewer is convinced that it is. He points to the developments that are taking place in research into artificial intelligence (AI), where new computer systems can independently learn to interpret complex contexts and situations. Several large companies, such as Google and Facebook, are currently making major investments in order to develop AI that can assess linguistic content.
“I can envisage that, in ten years’ time, a system will emerge that has consumed so much data and has made such exact self-adjustments that it will be able to evaluate a study more effectively than a person would. And, if you trust computer-generated reviews of both the methodological and subjective components in a study, why would you then not allow the computer to perform an objective assessment when it is time for a peer review”, asks Chad deVoss rhetorically.
Peers must still review
Stefan Eriksson doesn’t think this would be a particularly good idea. He believes that the major part of a peer review must be conducted by an actual peer.
“The scientific process is dependent upon the fact that we read and review each other’s results and then build upon these. This is such a central part of scientific activity that we cannot substitute it with an automated process. It is to be desired that we can receive help to do this more effectively, but what is not desirable is that such a process should take over the entire task.”
Daniel Shanahan at BioMed Central believes that his associates at Stat Reviewer can achieve much but, in view of the complexity of the review process, he finds it hard to see that a review could be fully automated. This position is also shared by Kimmo Eriksson, professor of mathematics at Mälardalen University. In his role as a frequent editor and reviewer, he thinks that the development is interesting but that automatic evaluation remains a distant prospect.
“Take the example of the theoretical article I’ve just been editing. It contained an interesting idea but the authors were not particularly clear-headed concerning the theory. As an editor, I was able to help them sort it out. Reviewing an analysis is one thing, but to automate the evaluation of theory is, I think, considerably more difficult and theory is an essential element of science”, says Kimmo Eriksson.
Show what is good and bad
Fredrik Heintz is an associate professor of computer science at Linköping University and chairs the Swedish AI Society. He believes that there will be a continued future need for editors and reviewers, but that they will work at a higher level. Artificial intelligence can function as a support in peer review, but people will need to teach the systems what is good and bad.
“The task of an editor could be to train a support system by citing examples of contributions that should either be accepted or refused. But this system will need to be maintained if it is to remain relevant and not just accept those things that resemble what was previously approved by the system.”
According to Fredrik Heintz, however, there are no fundamental reasons why AI should not eventually be able to get close to attaining human abilities. In recent years, systems of machine learning have enabled the rapid development of this field, and this will come to influence research on many levels.
In the future, it is conceivable that artificial intelligence will begin to be used to reproduce studies in order to assure their quality. Fredrik Heintz believes that this would also represent a way for AI systems to learn the research process.
“The real challenge”, says Chad deVoss, “is to understand what happens when artificial intelligence reaches a point at which it can propose experiments that will advance research within a certain field. Would this AI system then produce some form of study that another AI system could evaluate?”
Learn to understand human needs
How shall such research be managed so as to ensure that there is a focus on human needs? Is there a risk that intelligent systems will begin to do things for their own benefit and cannot then be stopped? These are concerns that often feature in the general debate regarding the risks of AI and which are even being discussed by many researchers beyond the field of AI.
The cosmologist Max Tegmark has, for example, claimed that it is important that we already begin to think about how we can equip artificial intelligence with some form of emotional capacity so that it becomes able to understand human considerations. This is likely to be necessary if artificial intelligence is to be able to replicate the work of Kimmo Eriksson at Mälardalen University and help researchers to reformulate their ideas to produce a sustainable theory.