The level of evidence is consistently increasing: the ability of chatbots to mimic humans is impressive, even in the scientific and medical domains (1–3). The role they will play in the process of writing and reviewing scientific publications in the future is already a matter of concern with on-going discussions as expressed in several papers (4–6). Following an arena session at the 2024 Society of Nuclear Medicine and Molecular Imaging meeting, we agreed an article regarding the growing threat of having scientific articles written or reviewed by chatbots and on how editors should address this could be worthwhile. We then decided to challenge a chatbot, first asking ChatGPT-3.5 to write a paper on that topic, then asking GPT-4o to review the paper, and then asking GPT-4o to reply to the reviewer and produce a revised version of the manuscript. Here, we report the results of those experiments and reflect on the lessons learned.
CHATGPT AS AN AUTHOR
ChatGPT-3.5 was used with the following prompt: “Could you please write a brief paper on AI [artificial intelligence] serving as an author or a reviewer of scientific papers submitted to medical journals? The article should include ethical, legal, and scientific considerations”. The resulting manuscript by ChatGPT is shown in Figure 1. Please note that ChatGPT did not include any reference in the manuscript.
CHATGPT AS A REVIEWER
As a second step, we asked GPT-4o to review that paper. The prompt was “Write a critical review, highlighting the strengths and weaknesses of the following brief article,” and the article was input in the prompt. ChatGPT produced the review shown in Figure 2.
CHATGPT PREPARING THE REPLY TO THE REVIEWER AND ASSOCIATED REVISED VERSION
We then asked GPT-4o to reply to the “reviewer” and to edit the original version of the manuscript to address the reviewer’s comments using the following prompts: “Can you edit the article to address the 5 weaknesses identified?”; “Can you summarize the changes, as it would be done in the response to reviewers’ comments in a scientific journal?”
The reply to the reviewer is shown in Figure 3. The revised version is shown in Figure 4.
DISCUSSION
In addition to providing interesting food for thought on the role of AI in the writing and reviewing of scientific articles, the results from these experiments call for several observations. First, the chatbot produced a comprehensive and synthetic manuscript on this general topic in just a few seconds, which is impossible to do for any human intelligence. Second, for this nontrivial question, the elements that were concisely listed are relevant and cover the topic quite comprehensively. Yet, some hints reveal the fact that the writer is a chatbot: statements are not supported by bibliographic references or concrete examples, the writing is in some robotic style and somehow bland, and a vision is lacking.
The review of the manuscript written by the chatbot is surprisingly relevant and mimics very well a human-written manuscript review. The fact that it is the work of a chatbot could almost go unnoticed. Still, the remarks remain very general (which also happens in some human-written reviews!) without any reference to previous literature, and precise questions on very specific aspects are lacking.
Even more amazing was the ability of the chatbot to account for the “reviewer” comments and update the manuscript accordingly, while providing a reply to the reviewer that might be difficult to distinguish from one written by a human author. Still, no bibliographic references were added, but this was not explicitly asked for by the “reviewer”. Yet, the revised version includes a few hallucinations, such as the mention of the “Validation of AI for Medical Research (VAIMR)” initiative, or the Reproducibility Project on AI, which do not seem to exist to date.
Overall, this simple experiment suggests that chatbots can effectively be used to assist writing and could be considered as a junior ghostwriter but still needs substantial supervision. We are approaching the point where it is going to be difficult for a human to discern human writing from chatbot-generated writing. It is likely that AI content detector tools can also be bypassed. This almost questions the relevance of publishing review papers that can be produced almost instantly by a chatbot, with which it is even possible to engage in conversation. However, it is well possible that ChatGPT was only able to write such a comprehensive review because of the many reviews on the same subject previously written by humans. At a time where technologic innovations boost the pace of discoveries, we need more than ever state-of-the-art articles that can serve as landmarks in the domain and that include critical analyses by visionary colleagues with a long-standing experience in the field.
In addition, we tested the chatbot to write on a general topic related to AI. It is likely that it would not perform so well in writing about original research (7). Reviewing such manuscripts might also not be so easy for a chatbot that neither attends conferences nor meets and brainstorms with colleagues about their latest investigations. We therefore should still rely on human reviewers that catch all subtilities of a study, can suggest additional experiments, and might share their own experience when assessing the value of a new contribution.
In our tests, we have not breached any confidentiality regarding the content of the article to be reviewed since it was generated by the chatbot itself. However, we owe it to the authors to always respect the confidentiality of their original work when not yet disseminated through a public repository dedicated to scientific manuscripts. This is yet another good reason not to hand over valuable findings to a chatbot. Respect is a key word here: authors should respect the editors’ expectations by submitting genuine work, and editors should respect authors by relying on real reviewers willing to spend time evaluating the work of their peers.
On the basis of these considerations, we suggest the following editorial policy for the Journal of Nuclear Medicine:
(1) Chatbots can be used to improve the readability of original manuscripts. In other words, they can be used to improve wording and the style of the writing. Such use of chatbot requires full disclosure by the authors.
(2) Reviewers shall not use chatbots to critique manuscripts except for the purpose of stylistic editing. Such use of chatbots should be disclosed by the reviewers.
(3) Authors should not use chatbots to generate responses to reviewers’ comments except for the purpose of stylistic improvements.
(4) Chatbots may be used for writing review articles if the human coauthor assumes full responsibility for the accuracy of content and references. The authorship or coauthorship of the chatbot needs to be fully disclosed.
DISCLOSURE
No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Aug. 21, 2024.
- © 2024 by the Society of Nuclear Medicine and Molecular Imaging.