Article 10 of Regulation (EU) 2024/1689 — Data and data governance. Official text, practical interpretation, key obligations and compliance implications.

Official Text Summary

Article 10 of Regulation (EU) 2024/1689 (the EU AI Act), located in Title III, Chapter 2, establishes mandatory data governance requirements for providers of high-risk AI systems. The article applies to training, validation, and testing datasets used in the development of such systems.

Under Article 10(2), data governance practices must address: the design choices behind data collection; data preparation operations including annotation, labelling, cleaning, enrichment, and aggregation; the formulation of relevant assumptions regarding intended use; and an examination of possible biases likely to affect health, safety, or fundamental rights.

Article 10(3) requires that training, validation, and testing datasets be subject to appropriate data management practices and be relevant, representative, free of errors, and complete in light of the intended purpose. Where complete freedom from errors is not achievable, providers must document residual errors and their potential impact.

Article 10(4) addresses the processing of special categories of personal data — as defined in Article 9 of Regulation (EU) 2016/679 (GDPR) and Article 10 of Regulation (EU) 2018/1725 — strictly limiting such processing to bias monitoring and correction, subject to appropriate safeguards. Article 10(5) provides that Member States may establish specific conditions for processing sensitive personal data in the public interest for AI development purposes, subject to strict conditions. Article 10(6) clarifies that dataset requirements apply proportionally to providers who use pre-existing datasets they did not originally collect, to the extent that such examination is technically feasible.

What This Means in Practice

Article 10 places concrete obligations on any organisation developing or deploying a high-risk AI system in the EU. In practice, compliance requires establishing and maintaining a structured data governance programme that spans the full development lifecycle.

For a provider developing an AI-assisted recruitment screening tool (an Annex III, point 4 system), Article 10 demands documenting why specific datasets were chosen, what preprocessing steps were applied, and how the datasets cover the demographic diversity of the intended applicant pool. If the training data underrepresents applicants from certain regional or ethnic backgrounds, the provider must identify this gap, assess the bias risk to equal treatment, and either correct the dataset or implement technical mitigations — all traceable in the technical documentation required by Article 11.

For a provider integrating a third-party AI model or dataset, Article 10(6) still requires a best-effort examination of the pre-built data's properties relative to the intended use case. Relying on an upstream vendor's data sheet is not sufficient in isolation; downstream providers must validate that the data properties align with their specific deployment context.

Practically, compliance teams should: maintain a data management plan per system; institute data lineage records; run documented bias and representativeness assessments at training, validation, and testing stages; and ensure any use of sensitive personal data for bias correction is covered by a lawful basis and appropriate access controls. These records form part of the technical documentation that must be available to national competent authorities on request.

Key Obligations

Relationship to Other Articles

Article 10 sits at the centre of the high-risk AI requirements framework and connects directly to several other provisions. Article 9 (risk management system) feeds into Article 10: risks identified through data governance — such as dataset bias — must be managed through the iterative risk management process. Article 11 (technical documentation) and Annex IV require providers to record data governance choices and dataset characteristics as part of the conformity documentation reviewed by notified bodies or self-assessed.

Article 13 (transparency and provision of information) relies on data documentation to enable meaningful instructions of use, particularly regarding known limitations arising from data quality gaps. Article 17 (quality management system) requires data governance to be institutionalised within the provider's broader quality processes.

For systems processing personal data, Article 10 intersects with GDPR obligations — Article 5 (data quality principles) and Article 25 (data protection by design) are complementary requirements that providers must satisfy in parallel. Article 10(4) explicitly cross-references Regulation (EU) 2016/679 and Regulation (EU) 2018/1725 to delimit permissible processing of sensitive data.

Compliance Timeline

The EU AI Act entered into force on 1 August 2024, twenty days after publication in the Official Journal of the EU (OJ L 2024/1689, 12 July 2024). Article 10 follows the phased application schedule applicable to high-risk AI systems:

Providers are strongly advised to begin data governance gap assessments well in advance of the applicable deadline, as remediation of training datasets and procurement of compliant third-party data can require significant lead time.

Official AI Act Compliance Deadline Calendar

Updated · Sources: Regulation (EU) 2024/1689 and the 2026 Digital Omnibus on AI.

Obligation Applies to Original date New date Status Countdown Legal basis
Prohibited Practices (Art. 5) All providers and deployers active AI Act Art. 5
GPAI Rules (Chapter 5) GPAI model providers active AI Act Art. 51-56
High-risk AI — Annex III (standalone) Providers of standalone Annex III systems deferred AI Omnibus 2026 Art. 6(2)
High-risk AI — Annex I (embedded) AI embedded in Annex I regulated products deferred AI Omnibus 2026 Art. 6(1)
AI-Generated Content Marking Providers of generative GPAI systems active AI Act Art. 50(2)
Regulatory Sandboxes National competent authorities active AI Act Art. 57

Download JSON · CC BY 4.0

Frequently Asked Questions

Article 10 requires providers of high-risk AI systems to implement data governance and management practices covering training, validation and testing datasets. This includes ensuring datasets are relevant, representative, free of errors, and complete relative to the intended purpose. Providers must also examine data for potential biases and document data collection and processing choices.

Article 10 applies to providers — legal or natural persons who develop or have high-risk AI systems developed for placing on the market or putting into service under their own name or trademark. It covers any high-risk AI system listed in Annex III or regulated sector annexes, regardless of whether the provider is established inside or outside the EU.

Limited exceptions exist for open-source models, but providers who place high-risk AI systems on the EU market using open-source components remain responsible for compliance with Article 10 in respect of the training, validation and testing data they use or curate for that system.

Article 10 applies to high-risk AI systems under Annex III (excluding credit institutions) from 2 August 2026. For high-risk AI systems regulated under specific Union harmonisation legislation listed in Annex I, the deadline extends to 2 August 2027. National market surveillance authorities began oversight responsibilities on 2 August 2025.

The regulation does not define a fixed statistical threshold. Providers must demonstrate, through documented analysis, that training data covers the intended geographic, demographic, contextual and operational conditions of deployment. Representation gaps must be identified, assessed for bias risk, and mitigated through technical measures or compensating safeguards documented in the technical file.

Stay ahead of AI Act changes

Get compliance alerts when deadlines or obligations change.

No spam. One-click unsubscribe.