Published Date : 10/2/2025Â
Republican Sen. Josh Hawley and Democrat Sen. Richard Blumenthal have unveiled a sweeping new bill that would require the most powerful AI developers to submit their models to the government for testing before they can be made available to the marketplace. The Artificial Intelligence Risk Evaluation Act (AIREA) would establish an Advanced Artificial Intelligence Evaluation Program inside the Department of Energy (DOE) and make participation mandatory for any developer whose systems cross a defined compute threshold.
In contrast to the patchwork of voluntary commitments and agency guidance that has characterized U.S. AI policy to date, the proposal would condition deployment on compliance, impose steep daily fines for violators, and task DOE with producing a roadmap for a permanent federal oversight regime. Hawley and Blumenthal cast their legislation as a bid to get ahead of catastrophic risks while grounding future rulemaking in empirical testing rather than marketing claims or worst-case speculation.
“As Big Tech companies continue to develop new generations of artificial intelligence, the wide-ranging risks of their technology continue to grow unchecked and underreported,” Hawley said. “Simply stated, Congress must not allow our national security, civil liberties, and labor protections to take a back seat to AI,” he added, noting that their bipartisan legislation would guarantee common-sense testing and oversight of the most advanced AI systems, so Congress and the American people can be better informed about potential risks.
“AI companies have rushed to market with products that are unsafe for the public and often lack basic due diligence and testing,” Blumenthal added. “Our legislation would ensure that a federal entity is on the lookout, scrutinizing these AI models for threats to infrastructure, labor markets, and civil liberties while providing the public with the information necessary to benefit from AI promises, while avoiding many of its pitfalls.”
The bill opens with an unusually blunt warning that rapidly advancing AI capabilities pose “significant risks to national security, public safety, economic competitiveness, civil liberties, and healthy labor and other markets.” It argues that as systems approach human-level performance “in virtually all domains,” the U.S. needs a secure testing program to generate data-driven options for managing emerging risks.
Within 90 days of enactment, DOE would have to launch standardized – and in some cases classified – evaluations designed to probe the likelihood of “adverse AI incidents,” a defined term that spans loss-of-control scenarios, weaponization by foreign adversaries, threats to critical infrastructure, “scheming behavior,” and serious erosions of civil liberties and competition. The bill spells out which systems fall under its authority. An “advanced artificial intelligence system” is defined by the total amount of computational power used to train it. Any model that required more than 10^26 operations to train crosses the threshold. That bar, set deliberately high, is meant to capture only frontier-class models rather than routine commercial AI.
The Secretary of Energy could propose a new definition, but it would not take effect unless Congress approved it by joint resolution. In practice, this compute-based standard is designed to single out the most consequential systems while anticipating future shifts. As algorithmic efficiency improves, lawmakers have reserved the right to decide when and how to recalibrate the line, and the obligations are unusually muscular. Covered developers “shall” participate in the program and must provide, on request, code, training data, model weights, architectural details, and other technical materials needed for evaluation.
Deployment of a covered model “in interstate or foreign commerce” – including open-source release – would be flatly prohibited unless the developer is complying. The penalty for violating either the participation mandate or the deployment ban is not less than $1 million per day. That figure is high enough to bite even the sector’s largest companies and would likely deter attempts to treat fines as a cost of doing business. Inside DOE, the new program would run adversarial red team testing that “matches or exceeds” anticipated jailbreak and exploit techniques and, where feasible, arrange independent third-party assessments and blind model evaluations to increase reliability.
Developers would receive formal risk reports, but the statute directs DOE to develop “containment protocols, contingency planning, and mitigation strategies” and to inform the creation of evidence-based standards, licensing procedures, and governance mechanisms drawing directly on test data. Perhaps the most controversial clause instructs DOE to “develop proposed options for regulatory or governmental oversight, including potential nationalization or other strategic measures” should testing suggest that artificial superintelligence is likely to arise.
Hawley and Blumenthal argue that contemplating extreme scenarios is a feature, not a detriment. Both lawmakers argue that there remains bipartisan appetite to tackle the highest-risk AI even as the White House warns that heavy regulation could slow innovation and undermine competitiveness with China. Meanwhile, Texas Senator Ted Cruz continues to push for federal rules for AI to head off what he calls a chaotic patchwork of state laws. The stakes are high given that the goal of Cruz and the White House is to reign in the states.
Anthropic CEO Dario Amodei publicly criticized a 10-year preemption on state AI regulation as “too blunt,” while urging Congress to require developers of powerful models to disclose testing methods and publish risk mitigation plans before release, arguing that some degree of pre-deployment accountability is necessary as models scale. Amodei’s remarks don’t endorse this bill specifically, but they underscore a nascent consensus that transparency and testing can’t be optional at the frontier.
Policy experts say the devil is in the details. Daniel Ho of Stanford’s Institute for Human-Centered AI has cautioned that “pre-deployment testing is important, but it is not enough.” Ho urged lawmakers to pair front-end checks with adverse-event reporting so government and the public can “learn about the technology as it evolves and respond when things go wrong.” Ho’s view highlights a gap the Hawley-Blumenthal bill partly covers through annual DOE updates to Congress, but not via a standing incident reporting system.
Others see the DOE venue as both pragmatic and provocative. DOE has deep expertise in high-performance computing and national security research, and it already runs some of the world’s most capable supercomputers. But placing pre-market approvals inside an energy agency rather than, say, a new independent commission, will invite scrutiny from civil-liberties advocates and industry alike. Sarah Myers West of the AI Now Institute has repeatedly warned that federal AI policy is at risk of being shaped by the largest companies’ agendas and has argued that Washington must curb deregulatory “wish lists” and keep the public interest front-and-center.
Within 360 days after becoming law, the Secretary of DOE would have to submit to Congress a detailed recommendation for a permanent oversight framework that is explicitly tethered to the program’s test results. The plan would have to analyze observed capabilities and risks – including “weaponization potential, self-replication capabilities, scheming behaviors, autonomous decision making, and automated AI development capabilities” – and propose “evidence-based standards, certification procedures, licensing requirements, and regulatory oversight structures.” It would also have to outline proposals for “automated and continuous monitoring” of hardware usage, compute inputs, and cloud deployments, reflecting mounting interest in compute-based “circuit breakers” for dangerous capability jumps.
The DOE secretary would be required to update that plan annually. The pilot program would sunset after seven years unless Congress renewed it. The bill’s compute trigger of 10^26 operations will be another flashpoint. Compute-based thresholds have the virtue of being relatively observable compared to fuzzy capability tests and correlate with emergent model behavior, but they’re imperfect. Algorithmic advances can deliver big capability gains without proportional increases in training FLOPs, and some high-risk systems might sit below the line. That likely explains the built-in mechanism allowing the Energy secretary to propose a new definition, subject to congressional approval, to keep the trigger calibrated to reality.
Internationally, the approach rhymes with the EU AI Act’s risk tiers and with California’s recent frontier-AI policy report emphasizing pre-deployment testing and transparency for external evaluations.Â
Q: What is the Artificial Intelligence Risk Evaluation Act (AIREA)?
A: The AIREA is a new bill introduced by Sen. Josh Hawley and Sen. Richard Blumenthal that aims to mandate federal oversight and testing for advanced AI systems before they can be deployed in the marketplace.
Q: What is the compute threshold for advanced AI systems under the AIREA?
A: The compute threshold for advanced AI systems under the AIREA is set at 10^26 operations, which is designed to capture only the most advanced models.
Q: What penalties are proposed for non-compliance with the AIREA?
A: The penalty for violating the participation mandate or the deployment ban under the AIREA is not less than $1 million per day.
Q: How does the AIREA plan to manage and mitigate risks associated with AI?
A: The AIREA plans to manage and mitigate risks by conducting standardized and classified evaluations, developing containment protocols, and proposing evidence-based standards and governance mechanisms.
Q: What is the role of the Department of Energy (DOE) in the AIREA?
A: The DOE would be responsible for launching the Advanced Artificial Intelligence Evaluation Program, conducting adversarial red team testing, and developing recommendations for a permanent oversight framework.Â