Anthropic donates Petri AI alignment tool to Meridian Labs

Anthropic has handed over development of its open-source AI alignment testing tool, Petri, to Meridian Labs, an independent AI evaluation nonprofit. The transfer comes with version 3.0, a major update that changes how the tool tests large language models.

Petri runs tests that check for deception, sycophancy, and a tendency to cooperate with harmful requests from users. It uses three components: an auditor model that generates test scenarios, the target model being evaluated, and a judge model that scores transcripts for misaligned behaviour. Every Claude model since Claude Sonnet 4.5 has gone through Petri before release as part of its alignment assessment.

Version 3.0 makes three architectural changes. The auditor and target models are now separate components that can be adjusted independently, rather than coupled together. That gives researchers more flexibility when designing tests without disrupting the rest of the pipeline.

A new add-on called Dish tests models in conditions closer to real deployment. It runs evaluations using a model’s actual system prompt and the software framework, or scaffold, that wraps the model in live use. The goal is to close the gap between lab testing and production behaviour: if a test environment differs too much from how the model is actually used, the model may behave differently.

Anthropic also linked Petri with Bloom, another open-source tool the company develops. Bloom provides deeper analysis of specific behaviours, complementing Petri’s broader test coverage.

Anthropic said the move keeps Petri independent of any single AI lab, comparing it to its earlier donation of the Model Context Protocol to the Linux Foundation. “Our goal is to support alignment tools that are open and useful for the broader AI development community,” the company said in an announcement.

Meridian Labs, the nonprofit now hosting Petri, already runs other evaluation tools including Inspect and Scout. The UK’s AI Security Institute has used Petri as a major component in its model evaluation work, specifically for assessing whether models show a propensity to sabotage AI research.

Petri was launched in October 2025 through the Anthropic Fellows programme. The company last week secured a large compute deal to support its AI training infrastructure.

The transfer fits into a wider debate about who should control model evaluation tools and whether assessments can be truly independent when run by the companies developing the models. Anthropic acknowledged that open tools “do not resolve harder questions about what should count as acceptable model behaviour.”

Anthropic donates Petri AI alignment tool to Meridian Labs

More from Ai

Anthropic locks in SpaceX Colossus 1 for 300MW of new Claude compute

Sweden's Legora buys Melbourne legaltech Graceview, plans Australian build-out

APRA warns mortgage brokers on AI risk as CBA scales fraud-detection agent