Hamdy & Hamdy on The Unseen Layers of AI: An Exploration of Poor Data Provenance in Model Training

Mohammad Hamdy (Almond Fintech) and Mona Hamdy (Independent) have posted “The Unseen Layers of AI: An Exploration of Poor Data Provenance in Model Training” on SSRN. Here is the abstract:

The opacity of AI model training presents a complex challenge with extensive implications. AI opacity not only threatens intellectual property rights. With limited visibility into the training process, users’ ability to assess output quality or biases is severely undermined, potentially leading to uninformed use, industry “monocultures,” and systemic risks. Economically, poor data provenance may exacerbate inequalities, privileging data providers in the Global North over their Global South counterparts, who face greater challenges in asserting their data rights. Additionally, it poses regulatory challenges for national authorities tasked with protecting citizens’ privacy, possibly triggering complex legal disputes and prompting risk-averse regulators to deny developers access to data. Consequently, these jurisdictions could be denied the opportunity to participate in AI development. Culturally, AI opacity hampers user assessment of model representativeness, which could threaten linguistic and cultural diversity and perpetuate the exclusion of certain cultures or groups.

This Policy Brief urges G20 countries to enhance data provenance in AI through regulatory and technological means. It provides an overview of various regulatory avenues for data provenance regulation and assesses their potential for success, highlighting the crucial role of the G20 in strengthening these standard-setting endeavors. The Policy Brief also explores the promise of emerging technologies in enhancing transparency within the AI sector and advocates for G20 support for these technologies as an additional means to promote transparency through market competition.