AI could "reinforce health inequities" if data not diverse
Researchers found evidence of poor data
Researchers found evidence of poor data
Researchers have warned that new AI technologies could reinforce healthcare inequalities if the datasets they are based on are not representative of diverse communities.
Experts from INSIGHT, the UK's Health Data Research Hub for eye health, and University Hospitals Birmingham NHS Foundation Trust are describing this as 'health data poverty'.
In recent years, many academic and commercial organisations have been developing artificial intelligence and other digital health technologies based on publicly available datasets. But there is little information about how many datasets actually exist or the diversity of people and health conditions represented within them, which could lead to the development of technologies and products that only work for certain groups or countries.
The research team included experts from Moorfields Eye Hospital NHS Foundation Trust, the London School of Hygiene and Tropical Medicine, McGill University in Montreal, and Health Data Research UK.
Focusing on eye health, a leading area for digital innovation in healthcare, consultant eye specialist and director of INSIGHT Professor Alastair Denniston and his colleagues carried out a global search to explore the availability of publicly available datasets, and the extent to which they represented the diversity and needs of the world’s population.
They analysed 94 datasets containing 507,724 clinical images and 125 videos of eyes gathered from at least 122,364 people. They then created a comprehensive catalogue detailing the source of each dataset, its accessibility, and the populations, diseases and types of images represented within it.
The team found that most images came from populations in Asia and Europe, with very few datasets from large parts of the world such as sub-Saharan Africa (just one) and South America (two datasets). Looking closer, they discovered that information about the people within each dataset was generally poor, with basic demographic information such as age, sex and ethnicity being missing in more than one in five (20%) datasets.
This lack of geographical diversity and demographic data is a concern because technologies developed using data from one population may not work effectively when applied to a different group of people or part of the world.
There were also significant disparities in the types of diseases covered by the datasets. The majority of images were relevant to diseases such as diabetic retinopathy, glaucoma and age-related macular degeneration, mainly because these images are routinely collected as part of healthcare and screening in countries with advanced modern health infrastructure.
However, cataracts, trachoma and refractive error – which have been labelled priority eye diseases by the World Health Organization, and account for half of the world's blindness – were significantly under-represented. These conditions are common in low and middle income countries where digital technology could make a big difference making healthcare more accessible.
These conditions have not traditionally included imaging as part of their standard assessment, but there is increasing evidence that AI technologies could play a role. However the lack of relevant data for developing and training AI-based tools makes it less likely that researchers and companies will be able to develop products that could help.
The researchers are saying this situation risks creating ‘health data poverty’ for populations and countries that are under-represented in current health datasets.
“We hope that our catalogue will raise awareness of more diverse datasets for the development of AI-based health technologies" Professor Denniston said. "We need to act now to encourage health systems and researchers to invest in publicly available datasets to support research and innovation in areas that are currently data poor. Otherwise, we risk perpetuating a growing digital divide where healthcare technologies are only developed to benefit diseases, populations and countries with advanced data infrastructure.”