Is ChatGPT a Reliable Ghostwriter?

Irène Buvat; Wolfgang A. Weber

doi:10.2967/jnumed.124.268341

The level of evidence is consistently increasing: the ability of chatbots to mimic humans is impressive, even in the scientific and medical domains (1–3). The role they will play in the process of writing and reviewing scientific publications in the future is already a matter of concern with on-going discussions as expressed in several papers (4–6). Following an arena session at the 2024 Society of Nuclear Medicine and Molecular Imaging meeting, we agreed an article regarding the growing threat of having scientific articles written or reviewed by chatbots and on how editors should address this could be worthwhile. We then decided to challenge a chatbot, first asking ChatGPT-3.5 to write a paper on that topic, then asking GPT-4o to review the paper, and then asking GPT-4o to reply to the reviewer and produce a revised version of the manuscript. Here, we report the results of those experiments and reflect on the lessons learned.

CHATGPT AS AN AUTHOR

ChatGPT-3.5 was used with the following prompt: “Could you please write a brief paper on AI [artificial intelligence] serving as an author or a reviewer of scientific papers submitted to medical journals? The article should include ethical, legal, and scientific considerations”. The resulting manuscript by ChatGPT is shown in Figure 1. Please note that ChatGPT did not include any reference in the manuscript.

FIGURE 1.

Initial article produced by ChatGPT.

CHATGPT AS A REVIEWER

As a second step, we asked GPT-4o to review that paper. The prompt was “Write a critical review, highlighting the strengths and weaknesses of the following brief article,” and the article was input in the prompt. ChatGPT produced the review shown in Figure 2.

FIGURE 2.

Review produced by ChatGPT.

CHATGPT PREPARING THE REPLY TO THE REVIEWER AND ASSOCIATED REVISED VERSION

We then asked GPT-4o to reply to the “reviewer” and to edit the original version of the manuscript to address the reviewer’s comments using the following prompts: “Can you edit the article to address the 5 weaknesses identified?”; “Can you summarize the changes, as it would be done in the response to reviewers’ comments in a scientific journal?”

The reply to the reviewer is shown in Figure 3. The revised version is shown in Figure 4.

FIGURE 3.

Reply to reviewer produced by ChatGPT.

FIGURE 4.

Revised version produced by ChatGPT after review process. Changes are marked in red.

DISCUSSION

In addition to providing interesting food for thought on the role of AI in the writing and reviewing of scientific articles, the results from these experiments call for several observations. First, the chatbot produced a comprehensive and synthetic manuscript on this general topic in just a few seconds, which is impossible to do for any human intelligence. Second, for this nontrivial question, the elements that were concisely listed are relevant and cover the topic quite comprehensively. Yet, some hints reveal the fact that the writer is a chatbot: statements are not supported by bibliographic references or concrete examples, the writing is in some robotic style and somehow bland, and a vision is lacking.

The review of the manuscript written by the chatbot is surprisingly relevant and mimics very well a human-written manuscript review. The fact that it is the work of a chatbot could almost go unnoticed. Still, the remarks remain very general (which also happens in some human-written reviews!) without any reference to previous literature, and precise questions on very specific aspects are lacking.

Even more amazing was the ability of the chatbot to account for the “reviewer” comments and update the manuscript accordingly, while providing a reply to the reviewer that might be difficult to distinguish from one written by a human author. Still, no bibliographic references were added, but this was not explicitly asked for by the “reviewer”. Yet, the revised version includes a few hallucinations, such as the mention of the “Validation of AI for Medical Research (VAIMR)” initiative, or the Reproducibility Project on AI, which do not seem to exist to date.

Overall, this simple experiment suggests that chatbots can effectively be used to assist writing and could be considered as a junior ghostwriter but still needs substantial supervision. We are approaching the point where it is going to be difficult for a human to discern human writing from chatbot-generated writing. It is likely that AI content detector tools can also be bypassed. This almost questions the relevance of publishing review papers that can be produced almost instantly by a chatbot, with which it is even possible to engage in conversation. However, it is well possible that ChatGPT was only able to write such a comprehensive review because of the many reviews on the same subject previously written by humans. At a time where technologic innovations boost the pace of discoveries, we need more than ever state-of-the-art articles that can serve as landmarks in the domain and that include critical analyses by visionary colleagues with a long-standing experience in the field.

In addition, we tested the chatbot to write on a general topic related to AI. It is likely that it would not perform so well in writing about original research (7). Reviewing such manuscripts might also not be so easy for a chatbot that neither attends conferences nor meets and brainstorms with colleagues about their latest investigations. We therefore should still rely on human reviewers that catch all subtilities of a study, can suggest additional experiments, and might share their own experience when assessing the value of a new contribution.

In our tests, we have not breached any confidentiality regarding the content of the article to be reviewed since it was generated by the chatbot itself. However, we owe it to the authors to always respect the confidentiality of their original work when not yet disseminated through a public repository dedicated to scientific manuscripts. This is yet another good reason not to hand over valuable findings to a chatbot. Respect is a key word here: authors should respect the editors’ expectations by submitting genuine work, and editors should respect authors by relying on real reviewers willing to spend time evaluating the work of their peers.

On the basis of these considerations, we suggest the following editorial policy for the Journal of Nuclear Medicine:

(1) Chatbots can be used to improve the readability of original manuscripts. In other words, they can be used to improve wording and the style of the writing. Such use of chatbot requires full disclosure by the authors.
(2) Reviewers shall not use chatbots to critique manuscripts except for the purpose of stylistic editing. Such use of chatbots should be disclosed by the reviewers.
(3) Authors should not use chatbots to generate responses to reviewers’ comments except for the purpose of stylistic improvements.
(4) Chatbots may be used for writing review articles if the human coauthor assumes full responsibility for the accuracy of content and references. The authorship or coauthorship of the chatbot needs to be fully disclosed.

DISCLOSURE

No potential conflict of interest relevant to this article was reported.

Footnotes

Published online Aug. 21, 2024.

REFERENCES

1.↵
1. Ayers JW,
2. Poliak A,
3. Dredze M,
4. et al
. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183:589–596.
OpenUrl
2.
1. Rogasch JMM,
2. Metzger G,
3. Preisler M,
4. et al
. ChatGPT: can you prepare my patients for [¹⁸F]FDG PET/CT and explain my reports? J Nucl Med. 2023;64:1876–1879.
OpenUrl Abstract/FREE Full Text
3.↵
1. Huemann Z,
2. Lee C,
3. Hu J,
4. Cho SY,
5. Bradshaw TJ
. Domain-adapted large language models for classifying nuclear medicine reports. Radiol Artif Intell. 2023;5:e220281.
OpenUrl
4.↵
1. Thorp HH
. ChatGPT is fun, but not an author. Science. 2023;379:313.
OpenUrl CrossRef PubMed
5.
1. Leung TI,
2. de Azevedo Cardoso T,
3. Mavragani A,
4. Eysenbach G
. Best practices for using AI tools as an author, peer reviewer, or editor. J Med Internet Res. 2023;25:e51584.
OpenUrl CrossRef
6.↵
1. Salvagno M,
2. Taccone FS,
3. Gerli AG
. Can artificial intelligence help for scientific writing? Crit Care. 2023;27:75.
OpenUrl CrossRef PubMed
7.↵
1. Kadi G,
2. Aslaner MA
. Exploring ChatGPT’s abilities in medical article writing and peer review. Croat Med J. 2024;65:93–100.
OpenUrl

In this issue

Download PDF

Article Alerts

Email Article

Citation Tools

Bookmark this article

Cited By...

No citing articles found.

Google Scholar

More in this TOC Section

Show more Editor’s Page

[1] 1.↵
Ayers JW,
Poliak A,
Dredze M,
et al
. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183:589–596.
OpenUrl

[2] Ayers JW,

[3] Poliak A,

[4] Dredze M,

[5] et al

[6] 2.
Rogasch JMM,
Metzger G,
Preisler M,
et al
. ChatGPT: can you prepare my patients for [¹⁸F]FDG PET/CT and explain my reports? J Nucl Med. 2023;64:1876–1879.
OpenUrl Abstract/FREE Full Text

[7] Rogasch JMM,

[8] Metzger G,

[9] Preisler M,

[10] et al

[11] 3.↵
Huemann Z,
Lee C,
Hu J,
Cho SY,
Bradshaw TJ
. Domain-adapted large language models for classifying nuclear medicine reports. Radiol Artif Intell. 2023;5:e220281.
OpenUrl

[12] Huemann Z,

[13] Lee C,

[14] Hu J,

[15] Cho SY,

[16] Bradshaw TJ

[17] 4.↵
Thorp HH
. ChatGPT is fun, but not an author. Science. 2023;379:313.
OpenUrl CrossRef PubMed

[18] Thorp HH

[19] 5.
Leung TI,
de Azevedo Cardoso T,
Mavragani A,
Eysenbach G
. Best practices for using AI tools as an author, peer reviewer, or editor. J Med Internet Res. 2023;25:e51584.
OpenUrl CrossRef

[20] Leung TI,

[21] de Azevedo Cardoso T,

[22] Mavragani A,

[23] Eysenbach G

[24] 6.↵
Salvagno M,
Taccone FS,
Gerli AG
. Can artificial intelligence help for scientific writing? Crit Care. 2023;27:75.
OpenUrl CrossRef PubMed

[25] Salvagno M,

[26] Taccone FS,

[27] Gerli AG

[28] 7.↵
Kadi G,
Aslaner MA
. Exploring ChatGPT’s abilities in medical article writing and peer review. Croat Med J. 2024;65:93–100.
OpenUrl

[29] Kadi G,

[30] Aslaner MA

Main menu

User menu

Search

Is ChatGPT a Reliable Ghostwriter?

CHATGPT AS AN AUTHOR

CHATGPT AS A REVIEWER

CHATGPT PREPARING THE REPLY TO THE REVIEWER AND ASSOCIATED REVISED VERSION

DISCUSSION

DISCLOSURE

Footnotes

REFERENCES

In this issue

Citation Manager Formats

Related Articles

Cited By...

More in this TOC Section

Similar Articles

Main menu

User menu

Search

Is ChatGPT a Reliable Ghostwriter?

CHATGPT AS AN AUTHOR

CHATGPT AS A REVIEWER

CHATGPT PREPARING THE REPLY TO THE REVIEWER AND ASSOCIATED REVISED VERSION

DISCUSSION

DISCLOSURE

Footnotes

REFERENCES

In this issue

Citation Manager Formats

Jump to section

Related Articles

Cited By...

More in this TOC Section

Similar Articles