Cohen on Public Utility for What?: Governing AI Datastructures

Julie E. Cohen (Georgetown U Law Center) has posted “Public Utility for What?: Governing AI Datastructures” (Yale Journal of Law and Technology, vol. 27 (2025).) on SSRN. Here is the abstract:

Both in the U.S. and in Europe, initiatives for AI governance have focused principally on identifying and mitigating the risks created by AI models and their downstream uses rather than on those created by the datasets on which the models are trained. As this paper will explain, some of the most intractable dysfunctions of generative AI systems involve datasets. In particular, the very large datasets amassed by dominant providers of generative AI and related services are rapidly taking on infrastructural characteristics and importance. Effective AI governance therefore requires an infrastructural turn in thinking about data. 

First, the paper explains the significance of the infrastructure lens and sketches some of the distinctive implications of data infrastructures, in particular, for governance of networked digital processes and the social and economic activities that they facilitate. Next, it explores two interrelated problems manifesting within generative AI systems-simulation and sociopathy-that illustrate the extent to which the project of AI governance is, unavoidably, a data governance project. In brief, generative AI models trained on content from the public internet are also trained on data infrastructures that have been developed in particular ways for particular purposes and that encourage the production and spread of particular kinds of content. Last, the paper considers whether the concept of public utility, now the subject of growing interest among legal scholars who study various regulated industries, might supply a possible foundation for tackling the data governance problems associated with generative AI systems. The public utility model, however, addresses only some of the considerations that the infrastructure lens highlights. It is highly attuned to questions about access to infrastructures and their outputs but relatively insensitive to questions about infrastructure configuration and input sourcing. The problems of simulation and sociopathy belong in the latter category.