Top 10 Challenges of Deploying Clinical AI


Many facilities are finding it frustrating trying to deploy their Artificial Intelligence algorithms. In many cases it is taking many months, countless staff and resources and the results are weak. The efforts result in a single point solution focusing on one specific pathology being implemented and the process begins again. AI faces many obstacles in recognizing its true potential, and understanding these top 10 challenges of deploying clinical AI will help organizations be more successful.

1. Radiology data is highly nuanced and unintended bias is hard to avoid without proper education.

As we all know, radiology data is highly nuanced and unintended bias is hard to avoid without proper education. This poses a particular challenge for data scientists who tend to oversimplify everything as “just a data problem”. When building a diagnostic aid, it’s important to understand the difference between a diagnosis, and patterns suggestive of diagnoses. While it’s reasonably easy to say with some degree of confidence that a patient has a calcified lung nodule, a pleural effusion, and consolidation from a chest x-ray in isolation, making the leap to diagnosing tuberculosis without anything beyond a jpeg is another story. These differences are often difficult to grasp without a medical degree, but critical to understand for someone building tools.

healthcare data is heavily nuanced
Healthcare data is heavily nuanced

2. Data scientists and engineers need support from many experts – not just radiologists.

Building holistic solutions that are robust to the Wild West standards of imaging and seamless in their ability to integrate into workflow requires knowledge of the entire diagnostic lifecycle. The context of the acquisition parameters of a CT or MR drives the intent and limitations of interpreting that image. Different studies call for different preset windowing levels or reconstruction kernels. This affects the visual appearance of a scan and can impose constraints on how that scan is to be read. MRI has dozens of settings that change the physical properties of a study that are out of the radiologist’s control by the point of interpretation. When building a model that interprets DICOM data, you’re beholden to these same limitations and need to be mindful of them. Beyond the technologist and radiologist are a whole team of stakeholders that also need to be considered in the flow of data and the impact that AI may have on them.

3. More time is spent building tools enabling us to build clinical AI than actually building clinical AI.

Since our inception we have had to develop standardization tools, anonymization tools, labeling tools, training clusters, and PACS integration tools. All of these things need to exist before we can develop sustainable systems and it’s hard to go without them. Most of these tools require their own product design and discovery with less than straightforward requirements. In order to label our data, we’ve had to develop sophisticated NLP to generate labels from radiology reports. While that’s good enough to start, there’s usually still a good deal of strongly supervised human labels required. This is extremely expensive, and while educational, a bit of a distraction from building and deploying actual models.

Tools for standardization, anonomization and labeling are needed first

4. Privacy and security are subjective measures with moving targets, especially on a global scale.

While the US recognizes “Safe Harbor” guidelines as safe enough, each privacy team in the US has their own rules and the real goal is to get buy-in on a case by case basis. To many, the proper rule is “expert opinion,” but there are no guidelines as to what an expert is. This has meant developing highly customizable solutions for de-identification and data storage. Anonymization tools need to occlude burned-in text in images, encrypt DICOM headers, and most difficult, removing identifiers from text reports.

Further, deployment architecture requires the full time focus from a proper cybersecurity expert, offering both on-site and cloud options. Most providers prefer not to have their data leave the network, today. Solving these problems on a global scale is considerably more challenging. Ultimately, most people care about privacy and security and absolutely should, as a top priority. The problem is that too much of this is left up to opinion, versus standard, and clearer documented guidelines will save us all a lot of trouble.

5. Most hospitals do not design software infrastructure with “ease of AI integration” in mind.

When I first began at Enlitic, my job was Forward Deployment Engineer. My first task was to travel to a major provider and determine how to access their 100TB+ of historical data. Unsurprisingly, this quickly became an exercise in patiently reverse engineering ancient systems with the fear of the whole system collapsing. The naïve engineer always thinks “copy everything directly from the disk” without consulting the PACS. You quickly learn that this often has a negative impact on the quality of metadata stored in the DICOM headers. This usually leads to a continuous loop of CFIND/CMOVE commands to properly pull from the PACS, but this can take ages and put a lot of strain on a live system.

If grabbing historical data is this challenging, imagine translating that into deploying a tool that sits on top of existing infrastructure. Platforms that monitor existing PACS and RIS systems, intercepting incoming studies in real time, augmenting them with overlays and reports, and pushing them back before a radiologist even opens them present a challenge. It provides a great experience for the radiologist, but a very challenging one for the engineers without breaking anything.

As an industry, we should continue to push medical providers to adopt more up to date infrastructure, and for more reasons than just to improve AI adoptability. Beyond that, open standards and APIs will save everyone a lot of time and pain.

6. Models don’t get regulatory approvals, claims around very specific uses of a model do.

FDA Approvals for AI are Specific to Pathologies

I’m always surprised to see that every study benchmarks model performance against human level performance when (hopefully) no one is trying to make a claim about replacing a radiologist. If a tool is going to be used by a doctor to enhance their performance, then the real test would be doctor vs doctor using the tool. This leads to the wider point that a model doesn’t get blanket regulatory approvals, claims around very specific uses of a model do.

These challenges are hard enough to overcome when understood, and far more daunting when we recognize them properly as unknown unknowns. How does one design a trial that confidently validates a tool which identifies “normal” studies up front? How about one which assists a doctor in treating cancerous nodules two years earlier than they otherwise might? I’m not sure the FDA knows that for sure, and so I wouldn’t claim to either.

7. Validation is expensive but important to repeat for each target population.

In keeping with the top 10 challenges of deploying clinical AI, validation is EXPENSIVE. Turns out it’s not cheap to hire 17 radiologists to read each of 300-500 CTs twice, once with and once without your tool helping them. It’s also not cheap to even find the data which meets all the requirements to do so, or to hire a third party to house it to maintain independence. It’s bad enough that we have to do this for each study and diagnosis we make claims around—it’s even worse when you realize validating on an American population for the FDA may or may not mean much without validation on a Japanese population for the PMDA.

Every population has different anatomic variances, different imaging protocols, different incidences of different abnormalities, and different definitions of what justifies follow up. The same is true when comparing an inpatient setting to an outpatient, screening, or emergency room setting. Proper validation is critical to ensure proper outcomes for patients and requires testing on the target population.

For all these reasons, it would be massively beneficial for governing bodies to adopt international standards and collaborate on building datasets not just for training, but for validation, perhaps even with a Kaggle-like layer of abstraction to prevent overfitting.

8. Workflow integration needs to be as convenient and non-threatening as possible.

There’s no sense in spending so much time and effort developing and validating a tool if no one is going to use it, and for that reason, workflow integration is a critical consideration that should be thought of at the very start of development. Historically, CAD has taken a lot of heat for increasing read times and false positive rates rather than reducing them. No one wants to learn a different software system to switch between for each type of study unless absolutely necessary. Further, it’s a tough sell to a radiologist when your pitch is to replace them. Solving these problems requires a concerted approach to user experience research in the field of radiology. Unfortunately, this is rarely considered, and even when it is, it is difficult to approach.

The bar for an engineer to sit in a real clinical setting and watch live interpretation is often quite high—as it probably should be. But case studies and readily accessible footage of live interpretation can seriously help good product thinking at an early stage.

9. Adoption requires buy-in from every stakeholder but they all speak different languages.

Selling a tool into a radiology setting requires interest and approval from the patient, the radiologist, and the financier who ultimately needs to sign the check. Patients will ultimately be happy with anything they perceive to improve their outcomes. Radiologists want the same, but without compromising their workflow or job security. Executives though often want whatever will reduce costs by the highest margin. In many cases, these value propositions can be at direct odds with one another, and this poses a considerable challenge in market adoption.

In theory, it should be easy to prove simultaneous reductions in read times and false positive rates alongside improvements in true positive rates—translating this directly into economic benefit though is often quite challenging without more readily available data on the economics of radiology. How does one place a price on detecting a life-threatening finding that may have otherwise been overlooked?

10. Not all problems in radiology are equal, some are more impactful (or attainable) than others.

To a non-radiologist, the term “lung cancer product” might mean something. To a radiologist, it probably just leads to a lot more questions. Is it nodule detection? Maybe for diagnosis? Does it provide characterization? Measurement? If measurement, is it volume analysis or simple axes? If axes, is it relative to the long, short or average axis? These are all problems with different implications in terms of the value they bring and the challenge in developing them.

To ensure we’re solving important problems, we need to know what the important problems are. In many cases, these aren’t even problems in detection or diagnosis. Beyond further education on radiology, emphasis on the specific needs can focus the development efforts of companies in the space.

Conclusion

Artificial intelligence is not about “Man vs Machine”. AI will augment the human aspect of radiology much like Google assists authors, coders, bloggers and teachers. These top 10 challenges of deploying clinical AI is just the first step in recognizing how we can fully realize the potential of AI in radiology.

 

Guest Blog: Kevin Lyman, Founder and Chief Science Officer, Enlitic