US Government to Safety Test New AI Models from Major Tech Firms

The U.S. Department of Commerce will now conduct safety tests on upcoming artificial intelligence (AI) models developed by tech giants Google, Microsoft, and xAI before their public release. This initiative follows voluntary agreements by these companies to submit their advanced AI systems for evaluation through the Commerce Department’s Centre for AI Standards and Innovation (CAISI), expanding existing collaborations with other AI developers like OpenAI and Anthropic.

Expanding AI Safety Collaborations

This new pact signifies a significant step in the government’s approach to AI oversight, particularly given the rapid advancements in AI capabilities. The evaluations will encompass testing, collaborative research, and the development of best practices for commercial AI systems.

Chris Fall, director of CAISI, stated that these expanded industry collaborations are crucial for scaling the center’s work in the public interest during this critical period of AI development. CAISI has previously conducted 40 evaluations of AI tools, including testing state-of-the-art models that have not yet been released to the public, though specific models that were held back were not disclosed.

Major AI Models Under Scrutiny

Among the models expected to undergo testing are Google’s Gemini, developed by its DeepMind subsidiary, which is already integrated into Google products and being utilized by U.S. defense and military agencies. Microsoft’s prominent AI tool, CoPilot, and xAI’s chatbot Grok will also be subject to these safety assessments. Grok has recently faced public scrutiny due to issues involving image generation capabilities.

Industry and Government Perspectives

Microsoft acknowledged the importance of government collaboration in AI safety, noting in a corporate blog post that while the company conducts its own AI model testing, addressing national security and large-scale public safety risks requires a joint effort with government entities. Representatives from Google’s DeepMind and xAI (controlled by SpaceX) did not immediately provide comments on the new agreements.

The expansion of these safety testing agreements marks a shift from a more hands-off approach to AI regulation previously adopted by the Trump White House. President Trump’s administration had emphasized removing regulatory burdens to foster AI development and ensure U.S. leadership in the field.

Evolving AI Landscape and National Security Concerns

However, the increasing use of AI by the U.S. military and concerns raised by companies like Anthropic, which announced it developed a model named Mythos considered too powerful for public release, appear to be influencing the White House’s stance. Senior staff members from the Trump administration reportedly met with Anthropic CEO Dario Amodei recently.

This engagement occurs even as Anthropic faces a lawsuit from the U.S. Department of Defense concerning the company’s refusal to disable safety guardrails for government use of its AI models. The ongoing dialogue and testing initiatives underscore the growing recognition of the potential risks associated with advanced AI and the need for robust safety protocols.

Implications and Future Outlook

The inclusion of Google, Microsoft, and xAI in these safety testing protocols signifies a more proactive governmental role in guiding the responsible development of cutting-edge AI technologies. This collaboration aims to balance innovation with the imperative to mitigate potential harms, from national security threats to broader public safety concerns. As these models undergo rigorous evaluation, the public and industry can anticipate a more transparent and potentially safer rollout of future AI capabilities. The focus will likely remain on how these governmental safety checks influence the pace and direction of AI innovation, and whether similar frameworks will be adopted globally.