Logo
Logo

What Are The Challenges in Evaluating New Digital Technologies and AI?

The past five years have seen a growing awareness and enthusiasm for of a wave of new technologies in health centred around digital tools and AI.

When it comes to evaluation, historically, there has been a focus on the technical capabilities of these tools. However, as we increasingly look to implement various use cases for these tools and technologies, whether RPM tools, AI chatbots, or even genetic decision support tools, how do we evaluate them?

There is a growing need to evaluate these technologies beyond their technical capabilities. Google DeepMind’s team, a pioneer of AI model development, has emphasised the need to focus on the socio-technical impacts. These include things like the impacts that arise from how humans interact with these technologies as well as health system effects.

Digital health or AI tools can't be considered one bucket, though. They represent a diverse range of technologies, products, and use cases with different target audiences. Bearing that in mind, here is a broad overview of why we need to think differently when evaluating these technologies.

The changing nature of new technologies

When it comes to evaluation, some of these new technologies don't easily lend themselves to traditional methods like the 'gold standard RCT'.

Whether it is AI algorithms, smart sensors, or RPM tools, the nature of tech development is iterative. Software is an underlying component of many of these tools, and the development cycle of software is fundamentally dynamic. Not only are features and functionality changing, especially based on user feedback, but the scope of the product can change over time, too, as companies merge, make tools interoperable, or even pivot their business model.

All this change means we sometimes look at a different product in later stages of implementation than the one we started with. So, if what we are evaluating is changing, the RCT model or even comparing it with counterfactual scenarios is challenging to use.

It's worth mentioning that many of the challenges below are not unique to tech; complex out-of-hospital interventions face many of the same issues.

The importance of context and implementation

Another factor in evaluating digital tools is that context matters a lot. That is, how these tools are used, by whom and for what. A big part of this context is how these tools are implemented. The same product can often be implemented in different models, used for various use cases, and featured in varying ways as part of a patient pathway. Add to that the idea that clinical practice isn't as standardised in the real world as you might think, so tools are used in varying ways and reflect some variation in clinical management. All this, and we haven't even covered local context, capabilities and capabilities

This can make it hard to develop an overarching view of whether a specific tool, product, or algorithm "works or not," let alone something broad like generative AI.

The rise of powerful companion tools

With the emphasis on improving productivity in health, these tools often focus on automating parts of the clinical workflow or providing health workers with insights and intelligence on the patient. The difference here is that, increasingly, a digital or AI product is not a 'treatment' in and of itself but part of the workflow of diagnosis or treatment. This makes it tricky to attribute impact because these tools might shape treatment, but standalone might not give any benefit.

In the simplest case, consider an RPM tool. It alone cannot help improve your chronic condition; you need good doctors and support staff to monitor and provide clinical guidance, and you need a patient to engage with the tool to take that advice. It gets more complicated when you take it to cutting-edge tools like those that could provide treatment recommendations based on genetic analysis. These algorithms use various factors including your genetic profile and recommend the treatment that will likely work best for you. What if the treatment isn't successful for you? Is that because the algorithm recommended the wrong treatment or because the treatment wasn't delivered effectively? If it works, where do we attribute impact?

The significant role of socio-technical analysis

Google DeepMind's team pointed out that the wider human interaction and system-wide impacts of these new technologies will be key to understanding their safety. We cannot simply assess the technical capabilities of the tools. We must also look at their human interaction and systemic impacts. This requires understanding users' relationship with the tools and how they interact with broader system incentives and context. For example, a chatbot that delivers health information might be great for most but less well-designed for those with low literacy levels. If the latter group stops using it and others continue to do so, which happens at scale, we could see a significant impact on health inequalities.

If we look closer, these aspects are totally new to health. It is likely that the contraceptive pill had some significant systemic impacts on society, and things like pathology tests and other diagnostic tools are complementary to actual treatment, which we manage just fine with. It's also true that many health interventions, particularly out-of-hospital ones, have similar complexities in terms of 'context and implementation' being important.

What's different now?

What is different here are three things. First, the sheer complexity and number of challenges like 1 to 5 above apply to each individual tool or technology. Add in the fact that we are facing a proliferation of tools and use cases, and this makes the evaluation challenge hard.

Next, there's the transparency problem. It's often very challenging to get insight into why AI models make specific recommendations. In some cases, developers can interrogate the models to understand which factors they took into account, but that has limitations. Besides, at the moment, it's a pretty niche skill. This means the core component of evaluations, which seeks to understand and explain how 'what happened' and 'what did it result in' might have to be revisited.

Finally, the introduction of tools' multimodal nature, images, sound, and especially language, adds new complexity to human interaction evaluation. These make technologies easier to anthropomorphise. This adds a whole new set of considerations when looking at these technologies because we need to understand the impact of things like trust and dynamic relationships with tools in a way we just didn't have to think about with CT scanners and EHRs.

The features of the latest wave of digital and AI tools in health mean we have to revisit our approach to evaluation. While traditional tools like RCTs and cost-benefit assessments will still have a place, especially in longer-term academic studies, we must rethink our methods. These methods will have to capture the dynamic nature of technological development, the context-driven and implementation-driven impacts, and the human and systemic effects.

In a future article, I will cover some promising methodologies that can be useful in evaluating these tools. These include rapid evaluation, micro RCTs and simulation.


To find out more or explore working together, drop me a message.

Helping you to improve health and wellbeing outside hospital walls, in homes, local communities and workplaces

Email

hello@eudaimoniahealth.uk

To find out more or explore working together, drop me a message.

Helping you to improve health and wellbeing outside hospital walls, in homes, local communities and workplaces

Email

hello@eudaimoniahealth.uk

To find out more or explore working together, drop me a message.

Helping you to improve health and wellbeing outside hospital walls, in homes, local communities and workplaces

Email

hello@eudaimoniahealth.uk