Article 10 of Regulation (EU) 2024/1689 — Data and data governance. Official text, practical interpretation, key obligations and compliance implications.
Official Text Summary
Article 10 of Regulation (EU) 2024/1689 (the EU AI Act), located in Title III, Chapter 2, establishes mandatory data governance requirements for providers of high-risk AI systems. The article applies to training, validation, and testing datasets used in the development of such systems.
Under Article 10(2), data governance practices must address: the design choices behind data collection; data preparation operations including annotation, labelling, cleaning, enrichment, and aggregation; the formulation of relevant assumptions regarding intended use; and an examination of possible biases likely to affect health, safety, or fundamental rights.
Article 10(3) requires that training, validation, and testing datasets be subject to appropriate data management practices and be relevant, representative, free of errors, and complete in light of the intended purpose. Where complete freedom from errors is not achievable, providers must document residual errors and their potential impact.
Article 10(4) addresses the processing of special categories of personal data — as defined in Article 9 of Regulation (EU) 2016/679 (GDPR) and Article 10 of Regulation (EU) 2018/1725 — strictly limiting such processing to bias monitoring and correction, subject to appropriate safeguards. Article 10(5) provides that Member States may establish specific conditions for processing sensitive personal data in the public interest for AI development purposes, subject to strict conditions. Article 10(6) clarifies that dataset requirements apply proportionally to providers who use pre-existing datasets they did not originally collect, to the extent that such examination is technically feasible.
What This Means in Practice
Article 10 places concrete obligations on any organisation developing or deploying a high-risk AI system in the EU. In practice, compliance requires establishing and maintaining a structured data governance programme that spans the full development lifecycle.
For a provider developing an AI-assisted recruitment screening tool (an Annex III, point 4 system), Article 10 demands documenting why specific datasets were chosen, what preprocessing steps were applied, and how the datasets cover the demographic diversity of the intended applicant pool. If the training data underrepresents applicants from certain regional or ethnic backgrounds, the provider must identify this gap, assess the bias risk to equal treatment, and either correct the dataset or implement technical mitigations — all traceable in the technical documentation required by Article 11.
For a provider integrating a third-party AI model or dataset, Article 10(6) still requires a best-effort examination of the pre-built data's properties relative to the intended use case. Relying on an upstream vendor's data sheet is not sufficient in isolation; downstream providers must validate that the data properties align with their specific deployment context.
Practically, compliance teams should: maintain a data management plan per system; institute data lineage records; run documented bias and representativeness assessments at training, validation, and testing stages; and ensure any use of sensitive personal data for bias correction is covered by a lawful basis and appropriate access controls. These records form part of the technical documentation that must be available to national competent authorities on request.
Key Obligations
- Data governance framework: Establish documented data governance and management practices covering all stages — training, validation, and testing — before placing a high-risk AI system on the market.
- Dataset quality criteria: Ensure datasets meet four cumulative quality criteria: relevance to the intended purpose, representativeness of the deployment context, freedom from errors (or documented residual error rates), and completeness.
- Bias examination: Conduct and document an examination of datasets for biases that could lead to discrimination or risks to health, safety, or fundamental rights, applying corrective measures where biases are identified.
- Data preparation documentation: Record all data preparation operations — annotation, labelling, cleaning, enrichment, and aggregation — along with the assumptions and design choices that shaped them.
- Restricted use of sensitive data: Process special categories of personal data for bias detection and correction only, subject to GDPR-equivalent safeguards, and only where strictly necessary and proportionate.
- Third-party and pre-existing datasets: Conduct a proportionate, technically feasible examination of any pre-existing or third-party datasets to verify they meet the requirements relevant to the provider's specific use case.
Relationship to Other Articles
Article 10 sits at the centre of the high-risk AI requirements framework and connects directly to several other provisions. Article 9 (risk management system) feeds into Article 10: risks identified through data governance — such as dataset bias — must be managed through the iterative risk management process. Article 11 (technical documentation) and Annex IV require providers to record data governance choices and dataset characteristics as part of the conformity documentation reviewed by notified bodies or self-assessed.
Article 13 (transparency and provision of information) relies on data documentation to enable meaningful instructions of use, particularly regarding known limitations arising from data quality gaps. Article 17 (quality management system) requires data governance to be institutionalised within the provider's broader quality processes.
For systems processing personal data, Article 10 intersects with GDPR obligations — Article 5 (data quality principles) and Article 25 (data protection by design) are complementary requirements that providers must satisfy in parallel. Article 10(4) explicitly cross-references Regulation (EU) 2016/679 and Regulation (EU) 2018/1725 to delimit permissible processing of sensitive data.
Compliance Timeline
The EU AI Act entered into force on 1 August 2024, twenty days after publication in the Official Journal of the EU (OJ L 2024/1689, 12 July 2024). Article 10 follows the phased application schedule applicable to high-risk AI systems:
- 2 February 2025: Prohibitions on unacceptable-risk AI practices (Article 5) became enforceable; no direct obligation under Article 10 at this stage.
- 2 August 2025: GPAI model requirements (Title VIII) became applicable; national market surveillance authorities assumed operational responsibility for high-risk system oversight.
- 2 August 2026: Article 10 becomes fully enforceable for high-risk AI systems listed in Annex III (general-purpose high-risk applications, including biometric identification, critical infrastructure, education, employment, essential services, law enforcement, migration, and justice). Providers must have compliant data governance documented and in place by this date.
- 2 August 2027: Article 10 becomes enforceable for high-risk AI systems regulated under Annex I (product-safety legislation, including machinery, medical devices, vehicles, and aviation). These systems must meet all data governance requirements within their product conformity assessments by this extended deadline.
Providers are strongly advised to begin data governance gap assessments well in advance of the applicable deadline, as remediation of training datasets and procurement of compliant third-party data can require significant lead time.
Official AI Act Compliance Deadline Calendar
Updated · Sources: Regulation (EU) 2024/1689 and the 2026 Digital Omnibus on AI.
| Obligation | Applies to | Original date | New date | Status | Countdown | Legal basis |
|---|---|---|---|---|---|---|
| Prohibited Practices (Art. 5) | All providers and deployers | active | — | AI Act Art. 5 | ||
| GPAI Rules (Chapter 5) | GPAI model providers | active | — | AI Act Art. 51-56 | ||
| High-risk AI — Annex III (standalone) | Providers of standalone Annex III systems | deferred | — | AI Omnibus 2026 Art. 6(2) | ||
| High-risk AI — Annex I (embedded) | AI embedded in Annex I regulated products | deferred | — | AI Omnibus 2026 Art. 6(1) | ||
| AI-Generated Content Marking | Providers of generative GPAI systems | active | — | AI Act Art. 50(2) | ||
| Regulatory Sandboxes | National competent authorities | active | — | AI Act Art. 57 |
⬇ Download JSON · CC BY 4.0
AI Act meets DORA and NIS2
Is your organisation subject to both the AI Act and DORA? The two regulations intersect on the operational resilience of financial AI systems. Our sister site regulation-dora.eu covers DORA in depth.
Explore regulation-dora.eu ↗Frequently Asked Questions
Article 10 requires providers of high-risk AI systems to implement data governance and management practices covering training, validation and testing datasets. This includes ensuring datasets are relevant, representative, free of errors, and complete relative to the intended purpose. Providers must also examine data for potential biases and document data collection and processing choices.
Article 10 applies to providers — legal or natural persons who develop or have high-risk AI systems developed for placing on the market or putting into service under their own name or trademark. It covers any high-risk AI system listed in Annex III or regulated sector annexes, regardless of whether the provider is established inside or outside the EU.
Limited exceptions exist for open-source models, but providers who place high-risk AI systems on the EU market using open-source components remain responsible for compliance with Article 10 in respect of the training, validation and testing data they use or curate for that system.
Article 10 applies to high-risk AI systems under Annex III (excluding credit institutions) from 2 August 2026. For high-risk AI systems regulated under specific Union harmonisation legislation listed in Annex I, the deadline extends to 2 August 2027. National market surveillance authorities began oversight responsibilities on 2 August 2025.
The regulation does not define a fixed statistical threshold. Providers must demonstrate, through documented analysis, that training data covers the intended geographic, demographic, contextual and operational conditions of deployment. Representation gaps must be identified, assessed for bias risk, and mitigated through technical measures or compensating safeguards documented in the technical file.
Stay ahead of AI Act changes
Get compliance alerts when deadlines or obligations change.
No spam. One-click unsubscribe.