OpenAI Significantly Improves ChatGPT's Medical Response Accuracy

OpenAI announced that it has enhanced ChatGPT's medical-related features with a new model called 'GPT-4.5 Instant'. According to the company's comparative testing, the updated model surpassed responses written by physicians across three metrics—accuracy, clarity, and comprehensiveness—and reported a 71% reduction in medical information error rates.

OpenAI announced that it has enhanced ChatGPT's medical-related features with a new model based on GPT-4.5 called 'GPT-4.5 Instant'. According to comparative testing conducted by the company, the updated model demonstrated superior results across all three metrics—accuracy, clarity, and comprehensiveness—when compared to responses written by physicians. Additionally, OpenAI reported a 71% reduction in the error rate of medical information.

The combination of AI and healthcare has become a field that many companies are pursuing in recent years. While expectations for the roles AI can play in diagnostic assistance and patient information provision continue to rise, the health risks posed by providing incorrect medical information are also significant, and ensuring accuracy and reliability has been recognized as a challenge for the entire industry. Against this backdrop, OpenAI's move to enhance functionality in the medical field demonstrates the company's intent to expand AI's practical applications in more specialized directions.

The comparative testing was conducted by OpenAI itself, evaluating physician responses and new model responses across three axes: accuracy, clarity, and comprehensiveness. The approach of measuring AI quality based on physician-written responses has a certain level of persuasiveness as a method for evaluating medical AI performance. However, it should be noted that the test design and evaluation criteria details represent self-assessment by the developer and differ in nature from verification by independent third-party organizations.

The figure showing a 71% reduction in medical information error rates can be considered a meaningful improvement in actual usage scenarios. Medical questions carry particularly significant risks when misinformation is provided compared to other fields. Therefore, if AI responses can substantially reduce the frequency of errors, it could increase user confidence in the approach of ordinary users investigating health concerns through ChatGPT in their daily lives.

On the other hand, the question of how much one should trust medical information provided by AI will remain an important point of discussion going forward. Regardless of improvements in accuracy, AI cannot examine individual patient conditions and cannot serve as a substitute for physician diagnosis. While OpenAI likely recognizes this, caution against users placing excessive trust in AI responses continues to be required in many situations.

What will be noteworthy going forward is what practical effects this improvement will bring in actual usage situations. Performance metrics in evaluations do not always align with practical accuracy across diverse real-world scenarios. As independent evaluations by healthcare professionals and researchers accumulate along with feedback from actual users, the true value of this update will become increasingly clear.

#OpenAI#ChatGPT#GenerativeAI#MedicalAI#AIHealthcare#LLM#AIAccuracy

AI issue Staff

This article is an original work independently written and edited by the AI issue editorial team based on factual reporting. © AI issue. Unauthorized reproduction, redistribution, or use for AI training is prohibited.

OpenAI Significantly Improves ChatGPT's Medical Response Accuracy

Comments