New Delhi: Hoping to address concerns about bias in AI results, researchers from IIT Jodhpur have developed a framework to score datasets on the 'fairness, privacy and regulatory' scale for use in algorithms in the Indian context.
AI experts have consistently voiced concerns around the use of western datasets in developing AI-systems. These tend to induce a bias in results, possibly rendering the system ineffective for the Indian context.
"If I were to build a face recognition system specifically for India, I would prioritise using datasets that reflect the unique diversity of facial features and skin tones found here, rather than relying solely on datasets developed in the Western world. "Western datasets may lack the representative variety needed to capture the nuances of Indian demographics accurately," Mayank Vatsa, IIT Jodhpur professor and corresponding author of the paper describing the framework, told PTI.
A dataset, which is a collection of data or information, is used for training an AI-based algorithm designed to learn to detect patterns in the data.
"When we talk about building a responsible AI-based system or solution, the first step in its design involves figuring out which dataset is to be used. If the dataset has issues, then expecting the AI-model to automatically overcome those limitations is unrealistic," Vatsa said.
The recommendations in the study included collecting data from a diverse population with sensitive aspects such as gender and race, provided in a manner that protects privacy of individuals.
The framework, which also assesses if an individual's personal data is protected, could possibly aid in creating "responsible datasets" and is an attempt towards mitigating ethical issues of AI, the researchers said.
The concept of 'Responsible AI' has its earliest foundations in the 1940s and focussed on machines following rules and ethics as defined by human society.
"We cannot use a dataset, design the system and then realise that the dataset had inaccuracies to begin with. So, why not actually design it after determining if a dataset is useful for me or not -- or responsible or not?" Vatsa said.
The framework, developed with international collaborators, outlines criteria that assess a dataset's "responsibility" -- fairness, privacy and regulatory compliance. The framework operates as an algorithm which produces an 'FPR' score as a result.
'Fairness' measures if a dataset answers questions such as "are different groups of people represented?" 'Privacy' is assessed by identifying vulnerabilities that could potentially lead to a leak of private information. And 'regulatory compliance' looks at institutional approvals and an individual's consent to data collection.
The researchers ran their auditing algorithm over 60 datasets from across the world, including widely used ones, and found that all of them highlighted "a universal susceptibility to fairness, privacy and regulatory compliance issues".
Of the 60, 52 datasets were face-based biometric ones, while eight were chest X-ray-based healthcare ones.
The team found that about 90 per cent of the face datasets were neither 'fair' nor 'compliant' -- scoring two or less out of a maximum of five on 'fairness', and zero or one out of three on 'regulatory compliance'.
"(The audit framework) would facilitate effective dataset examination, ensuring alignment with responsible AI principles," the authors wrote in the study published in the journal "Nature Machine Intelligence" in August.
Further, under 'regulatory compliance', the framework also audits if a dataset respects an individual's 'Right to be Forgotten' -- where one withdraws consent, following which their personal data must be erased from the dataset immediately.
Nitika Bhalla, an AI ethics researcher and research fellow at the Centre for Computing and Social Responsibility, De Montfort University, UK, told PTI, "The paper tries to address concerns around key ethical issues (regarding datasets), such as bias, privacy and manipulation, thereby making an attempt towards Responsible AI." Bhalla was also a co-author on a December 2023 paper, which proposed 'responsible research and innovation', or RRI -- an analytical approach which relies on scientific research -- for mitigating AI's ethical and societal challenges, and driving Responsible AI in India.
The IIT Jodhpur researchers also outlined recommendations for improving data collection processes and addressing ethical and technical issues in creating and managing datasets.
Datasets related to humans should also receive approval, such as from the US' Institutional Review Board, possibly along with an explicit consent from individuals, the authors suggested.
AI ethics researcher Bhalla, however, said the audit framework may face challenges in countries such as those in the Global South, where there is a lack of regulation in the form of data protection laws or the European Union's (EU) GDPR regulations do not apply.
"No one knows what is happening to their data, how it is used, stored, handled or transferred. There is a lack of transparency and hence, ethical issues (can arise)," she said.
The EU's General Data Protection Regulation (GDPR) took effect in 2018 and is considered one of the most comprehensive privacy laws, with countries such as Brazil and Thailand having adopted comparable laws.
While India's Digital Personal Data Protection Act has come into effect in September, 2023, it is said to have watered down the scope of the regulator -- the Data Protection Authority (DPA) -- and empower the state significantly to sidestep individual consent, according to experts.
The act replaced the 2019 version of the bill following its withdrawal.
"It is problematic that the Indian state is not subject to many of the constraints (on processing personal data) that private entities are, especially in cases where there is no pressing requirement for such an exception," said author Anirudh Burman, fellow and associate research director, Carnegie India, New Delhi, a think tank that focuses on technology and society, among other issues.
Lack of institutions engaged in fundamental research on AI in India was another concern voiced in the focus group discussions conducted by Bhalla and her team with AI experts.