Is the “Dragon Quest–Style AI” Still Alive?

Why Generative AI Struggles to Become a Certified Medical Device

Introduction

When we think of medical devices, what often comes to mind are massive, multimillion-dollar machines like CT scanners.
But today, even software and mobile apps can be certified as medical devices.

Indeed, image-diagnosis support software and ECG analysis apps have already been approved under Japan’s Pharmaceuticals and Medical Devices Act (PMDA) and are widely used in hospitals.

So what about applications built on generative AI and large language models (LLMs), like ChatGPT?
Unfortunately, under the current regulatory framework, it is extremely difficult for such systems to obtain medical device certification.

Drawing from my own consulting and development experience—such as building AI-based dietary guidance for diabetes patients and health-coaching systems based on body measurement data—this article explores:

  • Why generative AI faces significant barriers to certification as a medical device
  • How AI can still play a role in healthcare despite these limitations

What Does “Medical Device Certification” Mean?

In Japan, people place enormous trust in authority. And few things carry more authority in the medical business than government certification of a device.

To protect that trust, every medical device must undergo strict review by the Ministry of Health, Labour and Welfare (MHLW) and the PMDA. Evaluations emphasize three key points:

  1. Safety – Are there risks of malfunction or misdiagnosis?
  2. Efficacy – Is its medical effectiveness objectively proven?
  3. Reproducibility – Will it deliver the same results regardless of who uses it, or when?

Because lives are directly at stake, these requirements are enforced with extreme rigor.


Why Generative AI Struggles to Qualify

1. Probabilistic, Unstable Outputs

LLMs generate “probabilistic best guesses,” meaning the same input may yield different results. This inherent variability conflicts with the consistency and reproducibility demanded of medical devices.

Moreover, LLMs rely on half-precision floating-point calculations. Many physicians would understandably question: “Can medical judgment really be trusted at that level of precision?”

2. The Black-Box Problem

Generative AI cannot clearly explain how it arrives at its conclusions. Since medical devices must demonstrate transparent, explainable algorithms, this opacity is a critical flaw.

3. Unclear Training Data

LLMs are trained on massive datasets—including public sources—without guaranteed accuracy or traceability. The risk of misinformation makes regulatory approval almost impossible.

In contrast, the field of specialized AI (narrow-domain systems) thrives on carefully curated training data. Yet in medicine, where researchers are already overwhelmed, expecting them to filter vast datasets is highly impractical.


AI That Can Be Certified

Some AI-based medical devices have already been approved. Their shared characteristics:

  • Narrow, well-defined use cases (e.g., lung cancer diagnosis support, diabetic retinopathy image analysis)
  • Performance validated quantitatively
  • Identical outputs for identical inputs

In other words, only domain-specific, purpose-built AI can gain approval—not general-purpose generative AI.

Crucially, these systems almost always rely on rule-based AI. This approach follows human-defined conditional logic or expert knowledge bases.
Think of it as the same concept behind the battle AI in Dragon Quest IV, released back in 1988.

Yes—despite decades of progress, healthcare still depends on AI methods that are over 35 years old. But this preference is not unique to medicine. Finance and other “zero-failure tolerance” industries also consistently favor proven stability over bleeding-edge innovation.

In short, risk minimization is the top priority—and that is the core reason generative AI struggles to pass medical device certification.
In fact, in projects I’ve been involved with, the final implementation often ended up reverting to rule-based AI.


Future Outlook

So, does this mean generative AI has no role in healthcare?
Are we doomed to endlessly maintain old-fashioned rule-based systems?

Not at all. Efficiency matters in medicine too. Generative AI can bring value in non-critical, supporting roles, such as:

  • Streamlining physician documentation
  • Rewriting patient materials in plain language
  • Clarifying results from certified medical AI tools

Moreover, the future does not have to be “all or nothing.” Combining LLMs with rule-based AI may open powerful new possibilities.

In fact, my current consulting work focuses on exactly this: ensuring compliance with medical device standards while carefully integrating LLMs where they add value.


Key Takeaways

  • Medical device approval requires safety, efficacy, and reproducibility.
  • LLMs and generative AI struggle with these requirements due to unstable outputs, opacity, and unclear training data.
  • Only specialized, narrow-domain AI is currently eligible for certification.
  • Generative AI is better positioned to support workflow efficiency, not serve as a medical device itself.
  • A hybrid approach—LLM + rule-based AI—offers the most promising path forward.

Closing Thoughts

Healthcare IT is still a rapidly growing business. I regularly receive inquiries, including from overseas firms such as those in Taiwan, about launching new applications like appointment-booking platforms.

The industry cannot afford to remain stuck in purely conservative thinking. Even I, who have shipped rule-based AI products, recognize the limits of relying solely on classical methods.

That is precisely why the greatest challenge ahead is this:

How can we combine generative AI with proven traditional techniques to unlock the next generation of medical AI?

Scroll to Top