Wei et al. on Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations

Kevin Wei (RAND Corporation) et al. have posted “Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations” (A version of this paper has been accepted to ICML 2025 as a position paper (spotlight), with the title: “Position: Human Baselines in Model Evaluations Need Rigor and Transparency (With Recommendations & Reporting Checklist).”) on SSRN. Here is the abstract:

In this position paper, we argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and we provide recommendations and a reporting checklist towards this end. Human performance baselines are vital for the machine learning community, downstream users, and policymakers to interpret AI evaluations. Models are often claimed to achieve “super-human” performance, but existing baselining methods are neither sufficiently rigorous nor sufficiently well-documented to robustly measure and assess performance differences. Based on a meta-review of the measurement theory and AI evaluation literatures, we derive a framework with recommendations for designing, executing, and reporting human baselines. We synthesize our recommendations into a checklist that we use to systematically review 115 human baselines (studies) in foundation model evaluations and thus identify shortcomings in existing baselining methods; our checklist can also assist researchers in conducting human baselines and reporting results. We hope our work can advance more rigorous AI evaluation practices that can better serve both the research community and policymakers. Data is available at: https://github.com/kevinlwei/human-baselines

Lahmann et al. on The Fundamental Rights Risks of Countering Cognitive Warfare with Artificial Intelligence

Henning Lahmann (Leiden U Centre Law and Digital Technologies) et al. have posted “The Fundamental Rights Risks of Countering Cognitive Warfare with Artificial Intelligence” (Final version accepted and forthcoming in Ethics & Information Technology) on SSRN. Here is the abstract:

The article analyses proposed AI-supported systems to detect, monitor, and counter ‘cognitive warfare’ and critically examines the implications of such systems for fundamental rights and values. After explicating the notion of ‘cognitive warfare’ as used in contemporary public security discourse, it describes the emergence of AI as a novel tool expected to exacerbate the problem of adversarial activities against the online information ecosystems of democratic societies. In response, researchers and policymakers have proposed to utilise AI to devise countermeasures, ranging from AI-based early warning systems to state-run, internet-wide content moderation tools. These interventions, however, interfere, to different degrees, with fundamental rights and values such as privacy, freedom of expression, freedom of information, and self-determination. The proposed AI systems insufficiently account for the complexity of contemporary online information ecosystems, particularly the inherent difficulty in establishing causal links between ‘cognitive warfare’ campaigns and undesired outcomes. As a result, using AI to counter ‘cognitive warfare’ risks harming the very rights and values such measures purportedly seek to protect. Policymakers should focus less on seemingly quick technological fixes. Instead, they should invest in long-term strategies against information disorder in digital communication ecosystems that are solidly grounded in the preservation of fundamental rights.

Kemper & Kolain on K9 Police Robots: An Analysis Of Current Canine Robot Models Through The Lens Of Legitimate Citizen-Robot-State-Interaction

Carolin Kemper (German Research Institute Public Administration) and Michael Kolain (German Research Institute Public Administration (FÖV Speyer)) have posted “K9 Police Robots: An Analysis Of Current Canine Robot Models Through The Lens Of Legitimate Citizen-Robot-State-Interaction” (UCLA Journal of Law and Technology Vol. 30 (2025), 1-95, https://uclajolt.com/k9-police-robots-vol-30-no-1/) on SSRN. Here is the abstract:

The advent of a robotized police force has come: Boston Dynamics’ “Spot” patrols cities like Honolulu, investigates drug labs in the Netherlands, explores a burned building in danger of collapsing in Germany, and has already assisted the police in responding to a home invasion in New York City. Quadruped robots might soon be on sentry duty at US borders. The Department of Homeland Security has procured Ghost Robotics’ Vision 60—a model that can be equipped with different payloads, including a weapons system. Canine police robots may patrol public spaces, explore dangerous environments, and might even use force if equipped with guns or pepper spray. This new gadget is not unlike previous tools deployed by the police, especially surveillance equipment or mechanized help by other machines. Even though they slightly resemble the old- fashioned police dog, their functionalities and affordances are structurally different from K9 units: Canine robots capture data on their environment wherever they roam and they communicate with citizens, e. g. by replaying orders or by establishing a two-way audio link. They can be controlled fully through remote-control over a long distance—or they automate their patrol by following a preconfigured route. The law does currently not suitably address and contain these risks associated with potentially armed canine police robots.

As a starting point, we analyze the use of canine robots by the police for surveillance, with special regard to existing data protection regulation for law enforcement in the European Union (EU). Additionally, we identify overarching regulatory challenges posed by their deployment. In what we call “citizen-robot-state interaction,” we combine the findings of human-robot interaction with the legal and ethical requirements for a legitimate use of robots by state authorities, especially the police. We argue that the requirements of legitimate exercise of state authority hinge on how police use robots to mediate their interaction with citizens. Law enforcement agencies should not simply procure existing robot models used as military or industrial equipment. Before canine police robots rightfully roam our public and private spaces, police departments and lawmakers should carefully and comprehensively assess their purpose, which citizens’ rights they impinge on, and whether full accountability and liability is guaranteed. In our analysis, we use existing canine robot models “Spot” and “Vision 60” to as a starting point to identify potential deployment scenarios and analyze those as “citizen-robot-state interactions.” Our paper ultimately aims to lay a normative groundwork for future debates on the legitimate use of robots as a tool of modern policing. We conclude that, currently, canine robots are only suitable for particularly dangerous missions to keep police officers out of harm’s way.

Coleman on Human Confrontation

Ronald J. Coleman (Georgetown U Law Center) has posted “Human Confrontation” (Wake Forest Law Review, Vol. 61, Forthcoming) on SSRN. Here is the abstract:

The U.S. Constitution’s Confrontation Clause ensures the criminally accused a right “to be confronted with the witnesses against” them. Justice Sotomayor recently referred to this clause as “[o]ne of the bedrock constitutional protections afforded to criminal defendants[.]” However, this right faces a new and existential threat. Rapid developments in law enforcement technology are reshaping the evidence available for use against criminal defendants. When an AI or algorithmic system places an alleged perpetrator at the scene of the crime or an automated forensic process produces a DNA report used to convict an alleged perpetrator, should this type of automated evidence invoke a right to confront? If so, how should confrontation be operationalized and on what theoretical basis?

Determining the Confrontation Clause’s application to automated statements is both critically important and highly under-theorized. Existing work treating this issue has largely discussed the scope of the threat to confrontation, called for more scholarship in this area, suggested that technology might not make the types of statements that would implicate a confrontation right, or found that direct confrontation of the technology itself could be sufficient.

This Article takes a different approach and posits that human confrontation is required. The prosecution must produce a human on behalf of relevant machine statements or such statements are inadmissible. Drawing upon the dignity, technology, policing, and confrontation literatures, it offers several contributions. First, it uses automated forensics to show that certain technology-generated statements should implicate confrontation. Second, it claims that for dignitary reasons only cross-examination of live human witnesses can meet the Confrontation Clause. Third, it reframes automation’s challenge to confrontation as a “humans in the loop” problem. Finally, it proposes a “proximate witness approach” that permits a human to testify on behalf of a machine, identifies an open set of principles to guide courts as to who can be a sufficient proximate witness, notes possible supplemental approaches, and discusses certain broader implications of requiring human confrontation. Human confrontation could check the power of the prosecution, aid system legitimacy, and ultimately act as a form of technology regulation.

Wells on Battlefield Evidence in the Age of Artificial Intelligence-Enabled Warfare

Winthrop Wells (International Institute Justice and the Rule Law) has posted “Battlefield Evidence in the Age of Artificial Intelligence-Enabled Warfare” (26 Chicago Journal of International Law 249 (2025)) on SSRN. Here is the abstract:

A number of emerging technologies increasingly prevalent on contemporary battlefields—notably unmanned autonomous systems (UAS) and various military applications of artificial intelligence (AI)—are working a sea change in the way that wars are fought. These technological developments also carry major implications for the investigation and prosecution of serious crimes committed in armed conflict, including for an under-examined yet potentially valuable form of evidence: information and material collected or obtained by military forces themselves.

Such “battlefield evidence” poses various legal and practical challenges. Yet it can play an important role in justice and accountability processes, in which it addresses the longstanding obstacle of law enforcement actors’ inability to access the conflict-torn crime scenes. Indeed, military-collected information and material has been critical to prosecutions of international crimes and terrorism offenses in recent years.

The present Article briefly surveys the historical record of battlefield evidence’s use. It demonstrates that previous technological advances—including in remote sensing, communications interception, biometrics, and digital data storage and analysis—not only enlarged and diversified the broader pool of military data but also had similar downstream effects on the (far) smaller subset of information shared and used for law enforcement purposes.

The Article then examines how current evolutions in the means and methods of warfare impact the utility of this increasingly prominent evidentiary tool. Ultimately, it is argued that the technical features of UAS and military AI give rise to significant, although qualified, opportunities for collection and exploitation of battlefield evidence. At the same time, these technologies and their broader impacts on the conduct of warfare risk inhibiting the sharing of such information and complicating its courtroom use.

Mastro et al. on Human vs. Machine: Behavioral Differences between Expert Humans and Language Models in Wargame Simulations

Oriana Mastro (Stanford U Freeman Spogli Institute International Studies) et al. have posted “Human vs. Machine: Behavioral Differences between Expert Humans and Language Models in Wargame Simulations” (Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, 2024[10.1609/aies.v7i1.31681]) on SSRN. Here is the abstract:

To some, the advent of artificial intelligence (AI) promises better decision-making and increased military effectiveness while reducing the influence of human error and emotions. However, there is still debate about how AI systems, especially large language models (LLMs) that can be applied to many tasks, behave compared to humans in high-stakes military decision-making scenarios with the potential for increased risks towards escalation and unnecessary conflicts. To test this potential and scrutinize the use of LLMs for such purposes, we use a new wargame experiment with 107 national security experts designed to examine crisis escalation in a fictional US-China scenario and compare the behavior of human player teams to LLM-simulated team responses in separate simulations. Wargames have a long history in the development of military strategy and the response of nations to threats or attacks. Here, we find that the LLM-simulated responses can be more aggressive and significantly affected by changes in the scenario. We show a considerable high-level agreement in the LLM and human responses and significant quantitative and qualitative differences in individual actions and strategic tendencies. These differences depend on intrinsic biases in LLMs regarding the appropriate level of violence following strategic instructions, the choice of LLM, and whether the LLMs are tasked to decide for a team of players directly or first to simulate dialog between a team of players. When simulating the dialog, the discussions lack quality and maintain a farcical harmony. The LLM simulations cannot account for human player characteristics, showing no significant difference even for extreme traits, such as “pacifist” or “aggressive sociopath.” When probing behavioral consistency across individual moves of the simulation, the tested LLMs deviated from each other but generally showed somewhat consistent behavior. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.

Williams & Westlake on A Taste of Armageddon: Legal Considerations for Lethal Autonomous Weapons Systems

Paul R. Williams (Public International Law & Policy Group) and Ryan Jane Westlake (Independent) have posted “A Taste of Armageddon: Legal Considerations for Lethal Autonomous Weapons Systems” (Case Western Reserve Journal of International Law, volume 57, pg. 187, (2025)) on SSRN. Here is the abstract:

Lethal Autonomous Weapons Systems (LAWS) represent a profound shift in the nature of warfare, where machines, not humans, make life-or-death decisions on the battlefield. While these weapons offer strategic advantages, such as reducing human casualties and increasing operational efficiency, they also introduce significant legal, ethical, and accountability challenges. This Article explores the complexities surrounding the proliferation and use of LAWS, arguing that a total ban is unlikely due to the widespread accessibility and benefits these technologies offer to those who deploy them. Rather, this Article proposes the application of strict liability—traditionally a tort law concept—to the developers of LAWS as a means of promoting responsible development and ensuring accountability in the event a LAWS commits a war crime. By adapting this legal doctrine to the international criminal law context, the Article provides a pathway for holding those who design and deploy LAWS accountable for war crimes, thus bridging the gap between rapid technological advancement and the current limitations of international humanitarian law. The Article underscores the necessity of creative legal thinking to address the urgent and evolving challenges posed by autonomously lethal warfare technologies.

Müller et al. on Integrators at War: Mediating in AI-assisted Resort-to-Force Decisions

Dennis Müller (Centre the Study Existential Risk) et al. have posted “Integrators at War: Mediating in AI-assisted Resort-to-Force Decisions” on SSRN. Here is the abstract:

The integration of AI systems into the military domain is changing the way war-related decisions are made. It binds together three disparate groups of actors-developers, integrators, users-and creates a relationship between these groups and the machine, embedded in the (pre-)existing organisational and system structures. In this article, we focus on the important, but often neglected, group of integrators within such a sociotechnical system. In complex human-machine configurations, integrators carry responsibility for linking the disparate groups of developers and users in the political and military system. To act as the mediating group requires a deep understanding of the other groups’ activities, perspectives and norms. We thus ask which challenges and shortcomings emerge from integrating AI systems into resort-to-force (RTF) decision-making processes, and how to address them. To answer this, we proceed in three steps. First, we conceptualise the relationship between different groups of actors and AI systems as a sociotechnical system. Second, we identify challenges within such systems for human-machine teaming in RTF decisions. We focus on challenges that arise a) from the technology itself, b) from the integrators’ role in the sociotechnical system, c) from the human-machine interaction. Third, we provide policy recommendations to address these shortcomings when integrating AI systems into RTF decision-making structures.

Lubin on Technology and the Law of Jus Ante Bellum

Asaf Lubin (Indiana U Maurer Law) has posted “Technology and the Law of Jus Ante Bellum” (26(1) Chicago Journal of International Law (forthcoming, 2025)) on SSRN. Here is the abstract:

The temporal boundaries of international rules governing military force are myopic. By focusing only on the initiation and conduct of war, the legal dichotomy between Jus Ad Bellum and Jus In Bello fails to address the critical role of peacetime military preparations in shaping future conflicts. Disruptive military technologies, such as artificial intelligence and cyber offensive capabilities, only further underscore this deficiency. During their pre-war development, these technologies embed countless design choices, hardcoding into their software and user interfaces policy rationales, legal interpretations, and value judgments. Once deployed in battle, these choices have the potential to precondition warfighters and set in motion violations of international humanitarian law (IHL).     

This article highlights glaring inadequacies in how the U.N. Charter, IHL, and International Criminal Law (ICL) currently regulate peacetime military preparations, particularly those involving disruptive technologies. The article juxtaposes these normative gaps with a growing literature in moral philosophy and theology advocating for Jus Ante Bellum (just preparation for war) as a new limb in the Just War Theory model. By reimagining international law’s temporalities Jus Ante Bellum offer a proactive framework for addressing the risks posed by the development of disruptive military technologies. Without this recalibration, international law will continue to cede regulatory authority to the silent decisions made in the server farms of defense contractors and the fortified war rooms of central command, where algorithms and military strategies converge to dictate the contours of conflict long before it even begins.

Almenar et al. on The Protection of AI-Based Space Systems from a Data-Driven Governance Perspective

Roser Almenar (U Valencia Law) et al. have posted “The Protection of AI-Based Space Systems from a Data-Driven Governance Perspective” (75th International Astronautical Congress (IAC), Milan, Italy, 14-18 October 2024.) on SSRN. Here is the abstract:

Space infrastructures have long represented the pinnacle of technological and engineering achievements. This complexity has been further amplified by the advent of the new space race, where private actors are taking the lead, alongside states, in deploying thousands of satellites in outer space. The outer space environment of 2040 will look very different from today. Spacecraft will necessitate more frequent maneuvers to avoid potential collisions, with the need to be more conscious of their surroundings. Indeed, as the frequency of events and the number of space objects rises, decision-making tasks will increasingly challenge human operators, especially as physical and temporal margins diminish. Such complexity is enveloping thanks to the synergy of space technologies and Artificial Intelligence (AI), which is revolutionizing the functioning of space systems.

The forward trajectory clarifies the significance that AI in outer space will retain in the years ahead. TheCorpus Juris Spatialis finds itself at a crossroads, faced with the defiance of withstanding the technological advances catalyzed by the impending integration of AI into all facets of space missions. Given the ubiquitous nature of AI, its implementation will invariably pose multifaceted legal challenges across diverse aspects of International Space Law. The acquired autonomy of space assets prompts crucial questions regarding the legal standards applicable to AI in outer space, and how these autonomous space systems should be protected against hostile interference.

The main purpose of this paper, presented by the Space Law and Policy Project Group of the Space Generation Advisory Council (SGAC), is to examine the pivotal legal dimensions stemming from the automation of space-based applications from a ‘data-driven governance’ standpoint. The increase in production and acquisition of space data will just augment the sophistication of AI systems, therefore necessitating their data assets to be reliable, accurate, and consistent to safeguard the long-term success of AI technologies in space missions. The paper aims to address the overarching legal challenges posed by the integration of AI into outer space operations, specifically on cybersecurity, intellectual property, and data governance, which are critical for safeguarding autonomous systems. By examining the various nuances of these domains, it seeks to contribute to a comprehensive understanding of the legal landscape of the current AI-space pairing. Ultimately, the conclusion will offer a set of recommendations to pave the way for a secure, ethical evolution of autonomous space systems in the near future.