RFK Jr.’s AI Scandal: ChatGPT & the MAHA Report

Explore RFK Jr.'s AI scandal with his MAHA report, highlighting risks of AI in public policy.

If ever there was a case study on the risks and thrills of generative AI in high-stakes public policy, RFK Jr.’s recent “Make America Healthy Again” (MAHA) report has just written the latest chapter—and it’s not a flattering one. As of May 30, 2025, the Health and Human Services (HHS) Secretary faces mounting scrutiny after leaked details and investigative reporting revealed that his blockbuster report cited studies that simply do not exist, with many fingers pointing at the possible use of AI tools like ChatGPT to generate or verify citations[1][2]. The scandal, breaking as the report was meant to showcase the administration’s commitment to public health, has instead become a cautionary tale about the intersection of artificial intelligence, policy, and public trust.

Let’s face it: AI is everywhere these days, and not always for the better. When it comes to drafting reports, the temptation to offload research and citation work to large language models (LLMs) like OpenAI’s ChatGPT is understandable, especially for busy officials. But as this incident shows, what you gain in speed, you may lose in rigor—and credibility.

The MAHA Report: What Went Wrong?

The “Make America Healthy Again” (MAHA) report, released by Health and Human Services Secretary Robert F. Kennedy Jr. in late May 2025, was intended as a sweeping assessment of America’s chronic disease crisis among children[3]. It promised bold policy recommendations and data-driven insights. But within days, journalists at NOTUS and other outlets noticed something odd: several cited studies could not be found in the journals or databases listed[2]. For example, the American Academy of Pediatrics and the JAMA Network both confirmed to ABC News that certain papers referenced in the MAHA report did not exist in their publications[2].

When confronted, White House Press Secretary Karoline Leavitt downplayed the issue as a “formatting problem,” promising corrections and updates[2]. But industry experts, journalists, and even some of the researchers supposedly cited were not so forgiving. “The paper cited is not a real paper that I or my colleagues were involved with,” one researcher told NOTUS[1]. The report’s senior adviser, Calley Means, defended the substance of the findings, insisting that “any formatting errors have been corrected” and that the “underlying data and conclusions are correct”[2].

But here’s the rub: in policy and science, credibility is everything. When a report that’s meant to inform national health policy is riddled with citation errors—especially errors so glaring as citing imaginary studies—it undermines the whole effort. The updated version of the report removed at least some of the phantom references and softened some language, but the damage was already done[2].

The AI Connection: How Did This Happen?

While the MAHA report scandal is, on one level, a story about sloppy research, it’s also a window into the risks of relying on generative AI for high-stakes policy work. There’s no official confirmation from the HHS or RFK Jr.’s team that ChatGPT or any other AI tool was used to generate or verify citations, but the pattern of errors—nonexistent studies, misattributed findings—mirrors the notorious “hallucinations” of large language models[1][2]. These AI systems, trained on vast datasets, sometimes invent plausible-sounding references and facts that don’t actually exist.

Consider the broader context: AI-powered research tools are increasingly popular in government, academia, and industry. They promise to speed up literature reviews, draft reports, and even suggest policy recommendations. But as of May 2025, their limitations—especially when it comes to accuracy and verifiability—are becoming painfully clear.

The Bigger Picture: AI, Policy, and Public Trust

This isn’t just about a few bad citations. It’s about the broader implications of using AI in policy-making and public communication. When a government report is shown to contain fabricated or unverifiable references, it doesn’t just embarrass the authors—it erodes public trust in the institutions they represent.

Historically, policy reports have been subject to peer review, fact-checking, and editorial oversight. But as AI tools become more accessible, there’s a growing temptation to bypass these safeguards in the name of efficiency. The MAHA report scandal is a wake-up call: AI can be a powerful ally, but it’s not a substitute for rigorous human oversight.

Real-World Applications and Lessons Learned

The MAHA report is far from the first case of AI-generated errors making headlines. In academia, researchers have reported instances where LLMs invent citations, misquote sources, or fabricate data. In journalism, AI-generated news articles have occasionally included false or misleading information. And in business, companies have had to pull AI-generated marketing materials or investor reports after discovering inaccuracies.

What’s unique about the MAHA report is the scale and visibility of the error. This is a federal government document, intended to shape national health policy. The stakes couldn’t be higher.

So, what can be done? For starters, organizations using AI for research and policy work need to implement robust review processes. Every AI-generated citation, fact, or recommendation should be independently verified by human experts. Transparency is also key: if AI is used to draft or review a report, that should be disclosed to readers and stakeholders.

Industry Perspectives and Expert Reactions

AI experts are quick to point out that while LLMs like ChatGPT are incredibly capable, they’re not infallible. “The expectation from an AI expert is to know how to develop something that doesn’t exist,” says Vered Dassa Levy, Global VP of HR at Autobrains, highlighting that AI specialists are often tasked with creating new solutions—not just replicating existing ones[4]. But when it comes to research and policy, the emphasis should be on accuracy and verifiability, not just creativity.

Many in the tech industry are calling for clearer guidelines and best practices for using AI in policy and research. Some suggest that AI tools should be used as drafting assistants, not as final arbiters of truth. Others argue for “AI-augmented” rather than “AI-generated” research, where human experts remain firmly in the driver’s seat.

Comparing AI-Driven Research Tools

To put things in perspective, let’s compare how different AI tools are used in research and policy work:

Tool/Platform Strengths Weaknesses Typical Use Case
ChatGPT (OpenAI) Fast drafting, idea generation Prone to hallucinations, errors Early drafts, brainstorming
Google Scholar Vast academic database Limited synthesis, manual review Literature review, citation
Elicit AI-powered literature review Still requires human verification Research synthesis, summaries
Consensus Evidence-based answers Limited to published research Quick fact-checking, summaries

The takeaway? AI can help with drafting and discovery, but it’s not a substitute for rigorous human review—especially in policy and science.

Future Implications: Where Do We Go From Here?

As someone who’s followed AI for years, I’m thinking that the MAHA report scandal is a turning point. It’s not just a funny story about government blunders—it’s a warning about the limits of technology and the enduring importance of human expertise.

Looking ahead, we can expect more scrutiny of AI-generated content in policy and research. Regulators, journalists, and the public will demand greater transparency about how AI is used in government reports. And organizations will need to invest in training and oversight to prevent similar mistakes.

There’s also a bigger question here: as AI becomes more embedded in our institutions, how do we ensure that it serves the public good—without undermining trust or accuracy? The answer, I suspect, lies in striking a balance: embracing the efficiency and creativity of AI, while keeping human judgment at the center of critical decisions.

Conclusion and Final Thoughts

The RFK Jr. MAHA report scandal is both a cautionary tale and a catalyst for change. It highlights the risks of over-reliance on AI in policy-making, the importance of rigorous fact-checking, and the need for transparency in how technology is used in government.

As the dust settles, one thing is clear: the future of policy and research will be shaped by how well we integrate AI—not as a replacement for human expertise, but as a tool to enhance it. The MAHA report may have stumbled out of the gate, but if it leads to better practices and greater accountability, it could still leave a positive legacy.

Excerpt (for preview):
RFK Jr.’s MAHA report scandal spotlights AI’s risks in policy-making, as fabricated citations erode trust and prompt calls for better oversight and transparency in government research[1][2].

Tags:
generative-ai, ai-ethics, large-language-models, public-policy, healthcare-ai, ChatGPT, data-verification, government-ai

Category:
ethics-policy


“The paper cited is not a real paper that I or my colleagues were involved with.”
—Researcher cited in the MAHA report, speaking to NOTUS[1]

By the way, if you thought AI couldn’t make government more interesting—or more embarrassing—think again. In 2025, the stakes are higher than ever, and the lessons of the MAHA report will be remembered as a turning point in the debate over AI, policy, and public trust.

Share this article: