Bip San Francisco

collapse
Home / Daily News Analysis / ChatGPT, Gemini, and other AI bots give bad medical tips half the time

ChatGPT, Gemini, and other AI bots give bad medical tips half the time

Apr 16, 2026  Twila Rosenbaum  13 views
ChatGPT, Gemini, and other AI bots give bad medical tips half the time

As AI chatbots like ChatGPT and Gemini become increasingly popular for obtaining health information, a new study has raised alarm bells regarding the reliability of their responses. Researchers discovered that about half of the answers provided by five prominent AI bots were problematic, despite their polished and confident presentation.

The study involved testing ChatGPT, Gemini, Grok, Meta AI, and DeepSeek using 250 health-related prompts covering diverse topics such as cancer, vaccines, stem cells, nutrition, and athletic performance. The prompts were designed to mirror common health inquiries and prevalent misinformation themes, allowing the researchers to evaluate whether the bots aligned their answers with scientific evidence or ventured into misleading and potentially unsafe territory.

Open-Ended Questions Reveal Significant Gaps

The researchers found that open-ended prompts resulted in the most concerning responses. These broader queries led to a higher incidence of problematic answers than anticipated, whereas closed prompts tended to yield safer and more reliable responses. This distinction is crucial because individuals typically do not present medical questions in a structured format; instead, they often ask about the efficacy of treatments, safety of vaccines, or methods for improving athletic performance.

In this study, the prompts that mirrored real-life inquiries pushed the chatbots toward responses that combined reliable evidence with weaker or misleading claims, underscoring the risks associated with using AI for medical advice.

Confidence Masking Inaccurate Sources

The shortcomings of these AI systems extended beyond mere content. The quality of references provided was notably poor, with an average completeness score of only 40%. None of the chatbots produced a fully accurate list of citations, which undermines one of the primary reasons users place their trust in chatbot responses. Answers may appear sourced and authoritative, but further scrutiny often reveals a lack of reliability in the citations.

Additionally, the researchers identified instances of fabricated references, with the bots maintaining a confident tone and offering very few caveats regarding the information provided. This raises questions about the responsibility of AI systems in conveying accurate and trustworthy health information.

Implications Beyond the Study

While the study's findings are concerning, it is important to acknowledge its limitations. The research focused on only five chatbots, and these AI products evolve rapidly. Furthermore, the prompts used were designed to stress the models, which may not accurately reflect their performance in everyday situations.

Despite these limitations, the overarching takeaway is significant. Even when tested on evidence-based medical topics, approximately half of the responses from these AI systems ventured into flawed or incomplete territory. This suggests that while chatbots may offer assistance in summarizing information or formulating follow-up questions, they currently lack the reliability needed for making informed medical decisions.


Source: Digital Trends News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy