Article 55 of Regulation (EU) 2024/1689 — Evaluation and adversarial testing of general-purpose AI models with systemic risk. Official text, practical interpretation, key obligations and compliance implications.
Official Text Summary
Article 55 of Regulation (EU) 2024/1689 establishes specific evaluation and adversarial testing obligations for providers of general-purpose AI (GPAI) models that present systemic risk. Building on the broader set of obligations set out in Article 53 and the systemic-risk classification criteria in Article 51, Article 55 requires such providers to perform model evaluations in accordance with standardised protocols, and to conduct adversarial testing — commonly referred to as red-teaming — on a regular basis.
The evaluations required under paragraph 1(a) must follow standardised protocols and tools that reflect the state of the art, including those developed or endorsed by the AI Office. Where no standardised protocols exist, providers must design and apply appropriate methodologies to identify and assess the nature and extent of systemic risks.
Paragraph 1(b) mandates adversarial testing, to be carried out either internally or by engaging accredited external experts, with the objective of identifying risks not captured by standard evaluation procedures. Providers must document the methodology, scope, and outcomes of both evaluations and adversarial testing exercises, and report significant findings to the AI Office. The AI Office itself retains authority under paragraph 2 to organise or commission independent adversarial testing at any time. The article also obliges providers to share evaluation results and testing reports with competent authorities when requested.
What This Means in Practice
For organisations that develop or deploy frontier GPAI models, Article 55 imposes a structured and documented quality-assurance process focused specifically on identifying systemic harms. In practice, this means that before releasing a qualifying model — and on a continuing basis after release — providers must run both standardised capability evaluations and targeted adversarial exercises designed to probe for catastrophic or widespread risks such as mass-scale manipulation, generation of weapons-related content, large-scale cyberattacks, or critical infrastructure disruption.
From an operational standpoint, compliance requires assembling or contracting multidisciplinary red-team capacity with expertise spanning AI safety, cybersecurity, disinformation, biosecurity, and other relevant domains. Evaluations must be conducted against benchmarks and protocols that reflect the current state of the art; providers cannot rely on proprietary, unpublished methodologies alone if standardised alternatives exist.
Documentation is central. Providers must maintain detailed records of each evaluation cycle — including scope, team composition, scenarios tested, outputs observed, and mitigations applied — and must be able to produce these records for the AI Office upon request. Where testing reveals new or aggravated systemic risks, providers are obliged to implement corrective measures and, where the risk is serious, to notify the AI Office without undue delay.
For example, a provider releasing a large multimodal model exceeding the 10^25 FLOP training threshold should schedule red-team exercises before launch covering at minimum: dual-use scientific knowledge elicitation, persuasive content generation at scale, and automated cyberattack facilitation. Post-launch, these exercises must recur whenever the model undergoes significant fine-tuning or capability updates.
Key Obligations
- Standardised evaluations: Conduct model evaluations using state-of-the-art standardised protocols and tools, including those developed or endorsed by the AI Office, before market placement and on an ongoing basis.
- Adversarial testing (red-teaming): Carry out structured adversarial testing exercises — internally or via qualified external third parties — designed to surface systemic risks not identified through standard evaluation.
- Documentation and record-keeping: Maintain detailed records of evaluation methodology, scope, scenarios tested, outcomes, and any corrective measures taken, with records available to the AI Office on request.
- Reporting of significant findings: Notify the AI Office of serious or newly identified systemic risks discovered through evaluations or adversarial testing without undue delay.
- Cooperation with AI Office-commissioned testing: Facilitate and cooperate with independent adversarial testing organised or commissioned directly by the AI Office under its supervisory powers.
- Ongoing compliance post-release: Repeat evaluations and adversarial testing following significant model updates, fine-tuning, or changes to intended use cases that may alter the model's risk profile.
Relationship to Other Articles
Article 55 operates as the operational counterpart to the systemic risk classification established in Article 51 and the general GPAI obligations set out in Article 53. It should be read alongside Article 52, which defines the threshold and criteria for systemic risk designation, and Article 54, which addresses obligations relating to technical documentation for systemic-risk GPAI models. The incident reporting duty in Article 73 intersects with Article 55 where adversarial testing uncovers a serious incident or near-miss that requires notification. At the supervisory level, the AI Office's authority to commission testing under Article 55(2) is grounded in the broader oversight powers conferred by Articles 88 and 89. Providers should also consult Recital 110, which clarifies the rationale for distinguishing systemic-risk models and the importance of pre-market safety evaluation as a complement to ongoing monitoring.
Compliance Timeline
- 1 August 2024 — Regulation (EU) 2024/1689 entered into force, starting the phased application clock.
- 2 February 2025 — Prohibited AI practices (Title II) became applicable.
- 2 August 2025 — Title V provisions governing general-purpose AI models, including Article 55, became fully applicable. Providers of qualifying GPAI models already on the market were required to achieve compliance by this date.
- 2 December 2026 — High-risk AI system obligations under Annex I (safety-component systems) become applicable.
- 2 August 2027 — Remaining high-risk AI system obligations (Annex III systems) become applicable.
Article 55 is therefore already in force. Providers of GPAI models with systemic risk that have not yet established evaluation and adversarial testing programmes are in breach of current obligations and should treat remediation as an immediate priority.
Official AI Act Compliance Deadline Calendar
Updated · Sources: Regulation (EU) 2024/1689 and the 2026 Digital Omnibus on AI.
| Obligation | Applies to | Original date | New date | Status | Countdown | Legal basis |
|---|---|---|---|---|---|---|
| Prohibited Practices (Art. 5) | All providers and deployers | active | — | AI Act Art. 5 | ||
| GPAI Rules (Chapter 5) | GPAI model providers | active | — | AI Act Art. 51-56 | ||
| High-risk AI — Annex III (standalone) | Providers of standalone Annex III systems | deferred | — | AI Omnibus 2026 Art. 6(2) | ||
| High-risk AI — Annex I (embedded) | AI embedded in Annex I regulated products | deferred | — | AI Omnibus 2026 Art. 6(1) | ||
| AI-Generated Content Marking | Providers of generative GPAI systems | active | — | AI Act Art. 50(2) | ||
| Regulatory Sandboxes | National competent authorities | active | — | AI Act Art. 57 |
⬇ Download JSON · CC BY 4.0
AI Act meets DORA and NIS2
Is your organisation subject to both the AI Act and DORA? The two regulations intersect on the operational resilience of financial AI systems. Our sister site regulation-dora.eu covers DORA in depth.
Explore regulation-dora.eu ↗Frequently Asked Questions
Adversarial testing, also known as red-teaming, refers to structured assessments in which experts attempt to elicit harmful, biased, or otherwise undesirable outputs from a general-purpose AI model. Article 55 requires providers of GPAI models with systemic risk to conduct such testing prior to placing the model on the market and on an ongoing basis thereafter, to identify and mitigate serious risks before they cause harm.
Article 55 applies exclusively to providers of general-purpose AI (GPAI) models that have been determined to present systemic risk — a designation triggered, under Article 51, when a model is trained using a total compute of more than 10^25 FLOPs, or when the European Commission concludes through other means that the model presents systemic risk. Providers of GPAI models below this threshold are not subject to Article 55.
Article 55 allows providers to conduct adversarial testing using internal resources or by engaging qualified external third parties. Notably, the article empowers the AI Office to organise and coordinate independent adversarial testing of GPAI models with systemic risk, including by commissioning such testing from trusted bodies. Results and methodologies must be documented and made available to the AI Office upon request.
The provisions governing general-purpose AI models, including Article 55, became applicable on 2 August 2025, twelve months after the Regulation entered into force on 1 August 2024. Providers who placed a qualifying GPAI model on the market before that date had until 2 August 2025 to achieve compliance with the systemic-risk obligations.
Non-compliance with the obligations for GPAI models with systemic risk — including the adversarial testing requirement in Article 55 — can attract administrative fines of up to 3% of global annual turnover, or EUR 15 million, whichever is higher. The AI Office, which has primary supervisory authority over GPAI providers, may also issue corrective measures, request additional documentation, or suspend market access in serious cases.
Stay ahead of AI Act changes
Get compliance alerts when deadlines or obligations change.
No spam. One-click unsubscribe.