EU AI Act: Observability Requirements for LLM/GenAI Systems

Document Version: 1.2 Created: 2026-01-29 Updated: 2026-01-31 Source: EU AI Act (Regulation 2024/1689)


Overview

The EU AI Act entered into force on August 1, 2024, with a phased implementation timeline. This document summarizes the observability, logging, and documentation requirements relevant to LLM and GenAI systems.

Implementation Timeline

DateRequirements
Aug 2024Act enters into force
Feb 2025Prohibited AI practices apply
Aug 2025GPAI obligations (Articles 53, 55)
Aug 2026High-risk AI system requirements (Articles 12, 19)

General-Purpose AI (GPAI) Requirements

Article 53: GPAI Provider Obligations

Effective: August 2, 2025

All GPAI model providers must:

  • Maintain technical documentation per Annex XI
  • Provide information to downstream providers per Annex XII
  • Establish copyright compliance policies
  • Publish training data summaries

Article 55: Systemic Risk GPAI Obligations

Effective: August 2, 2025

Models trained with >10^25 FLOPs additionally require:

  • Model evaluation using standardized protocols
  • Adversarial testing (red teaming)
  • Systemic risk tracking and mitigation
  • Cybersecurity protection
  • Incident reporting to EU AI Office

Annex XI: GPAI Technical Documentation Requirements

Applies to: All GPAI providers (including LLM providers)

Section 1: All GPAI Providers

1. General Description

| Element | Description | |———|————-| | Tasks | Intended tasks and AI system integration types | | Acceptable Use | Policies governing permitted uses | | Release Info | Date and distribution methods | | Architecture | Model architecture and parameter count | | I/O Format | Input/output modalities and formats | | License | Licensing terms |

2. Design & Training Process

| Element | Description | |———|————-| | Technical Means | Infrastructure, tools, usage instructions for integration | | Design Specifications | Training methodologies, key design choices, rationale, assumptions | | Optimization | What the model optimizes for, parameter relevance |

3. Data Documentation

| Element | Description | |———|————-| | Data Sources | Type and provenance of training/test/validation data | | Curation Methods | Cleaning, filtering, preprocessing techniques | | Data Points | Number, scope, and main characteristics | | Data Selection | How data was obtained and selected | | Bias Detection | Methods to identify unsuitable sources and biases |

4. Compute & Energy

| Element | Description | |———|————-| | Compute Resources | FLOPs used for training | | Training Time | Duration of training process | | Energy Consumption | Known or estimated (can estimate from compute) |

Section 2: Systemic Risk GPAI (Additional)

ElementDescription
Evaluation StrategiesCriteria, metrics, methodology for identifying limitations
Adversarial TestingRed teaming, alignment, fine-tuning measures
System ArchitectureSoftware component interactions, processing flow

High-Risk AI System Requirements

Article 12: Record-Keeping

Effective: August 2, 2026

Core Requirements

  1. Automatic Logging Capability
    • Systems must technically enable automatic event recording (logs)
    • Logging must persist over the system’s entire lifetime
  2. Required Log Events
    • Situations that may present risk (per Article 79(1))
    • Substantial modifications to the system
    • Events relevant to post-market monitoring (Article 72)
    • Operational monitoring events (Article 26(5))
  3. Biometric Identification Systems (Annex III, point 1(a))
    • Session timestamps (start/end of each use)
    • Reference database against which input was checked
    • Input data that produced matches
    • Identity of humans who verified results (per Article 14(5))

Rationale (Recital 71)

“Having comprehensible information on how high-risk AI systems have been developed and how they perform throughout their lifetime is essential to enable traceability of those systems, verify compliance with the requirements under this Regulation, as well as monitoring of their operations and post market monitoring.”

Key points:

  • Technical documentation must be kept up to date throughout lifetime
  • Enables traceability and compliance verification
  • Supports post-market surveillance

Article 19: Automatically Generated Logs

Effective: August 2, 2026

  • Providers must retain logs generated by high-risk AI systems
  • Minimum retention period: 6 months (unless otherwise specified by law)
  • Deployers under provider control must also maintain logs

Observability Implementation Mapping

OTel GenAI Semantic Conventions Alignment

EU AI Act RequirementOTel GenAI Attribute/Event
Session timestampsgen_ai.conversation.id + span timestamps
Model identificationgen_ai.response.model
Input logginggen_ai.content.prompt event
Output logginggen_ai.content.completion event
Tool/database referencesgen_ai.tool.name, gen_ai.tool.call.id
Token usagegen_ai.usage.input_tokens, gen_ai.usage.output_tokens
Request parametersgen_ai.request.temperature, gen_ai.request.max_tokens
Finish reasonsgen_ai.response.finish_reasons
Provider identificationgen_ai.provider.name, gen_ai.system

observability-toolkit Configuration

// Recommended settings for EU AI Act compliance
{
  RETENTION_DAYS: 180,          // 6+ months per Article 19
  LOG_LEVEL: 'info',            // Capture operational events
  TRACE_CONTENT: true,          // Enable input/output logging
  SESSION_TRACKING: true,       // Track conversation sessions
}

Compliance Checklist

  • Enable automatic event logging for all AI system interactions
  • Capture session start/end timestamps
  • Log model version and configuration per request
  • Record input data and corresponding outputs
  • Track human verification events (if applicable) - see 1.8.6/BACKLOG.md
  • Implement 6+ month log retention (RETENTION_DAYS config)
  • Maintain technical documentation and keep it updated
  • Enable traceability via trace IDs and session IDs

Penalties

ViolationFine
Prohibited AI practicesUp to 35M EUR or 7% global turnover
High-risk AI non-complianceUp to 15M EUR or 3% global turnover
Incorrect information to authoritiesUp to 7.5M EUR or 1% global turnover
GPAI provider violationsUp to 15M EUR or 3% global turnover

References

Official Sources

Article References

Annex References

Recitals


Document History

VersionDateChanges
1.02026-01-29Initial research compilation
1.12026-01-29Added Appendix A (session telemetry) and Appendix B (toolkit compliance)
1.22026-01-31Updated to v1.8.5; marked evaluation events complete; updated compliance checklist

Appendix A: Session Telemetry Data

This appendix demonstrates telemetry data captured during the research session that produced this document, showing how observability-toolkit captures EU AI Act-relevant data.

Session Overview

AttributeValue
Session IDa8a71f9f-58de-4733-b912-d677b14f1575
Modelclaude-opus-4-5-20251101
Date2026-01-29
Messages106
Total Tokens85,385
Context Utilization42.7%

Token Breakdown

CategoryTokens
System Prompt8,000
System Tools15,000
Messages62,385
Cache Read85,123
Cache Creation252

Cost Tracking

MetricValue
Input Cost$0.0001
Output Cost$0.0006
Total Cost$0.0007

Sample Traces Captured

The following traces were captured during this session, demonstrating automatic event logging per Article 12 requirements:

1. MCP Tool Invocations

Trace ID: 464192682aa7f9cc25a9fa92bb136768
Span: hook:mcp-pre-tool
Duration: 4.35ms
Attributes:
  - mcp.server: observability-toolkit
  - mcp.tool: obs_query_traces
  - session.id: a8a71f9f-58de-4733-b912-d677b14f1575
  - service.name: claude-code-hooks

2. Web Research Tool Usage

Trace ID: d856db220dcee13d71c861488e76b9e4
Span: hook:mcp-post-tool
Duration: 2.38ms
Attributes:
  - mcp.server: webresearch
  - mcp.tool: visit_page
  - mcp.success: true
  - session.id: a8a71f9f-58de-4733-b912-d677b14f1575

3. File Operations

Trace ID: 2711401030067a7d545db286379692a7
Span: hook:builtin-post-tool
Duration: 4.50ms
Attributes:
  - builtin.tool: Write
  - builtin.category: file
  - builtin.success: true
  - session.id: a8a71f9f-58de-4733-b912-d677b14f1575

4. Token Metrics Extraction

Trace ID: 917fa2b09b9a4e4062bb5ad07737771c
Span: hook:token-metrics-extraction
Duration: 17.07ms
Attributes:
  - tokens.input: 883
  - tokens.output: 185
  - tokens.cache_read: 3,523,578
  - tokens.model: claude-opus-4-5-20251101

Historical Session Data

DateAvg TokensSessions
2026-01-2760,0003
2026-01-2865,0002
2026-01-29128,5808

Appendix B: observability-toolkit EU AI Act Compliance Assessment

Compliance Matrix

EU AI Act RequirementArticleobservability-toolkit CapabilityStatus
Automatic event loggingArt. 12(1)Automatic trace/span recording via OTelSupported
Session timestampsArt. 12(3)(a)session.id + span start/end timesSupported
Tool/database referencesArt. 12(3)(b)mcp.server, mcp.tool, gen_ai.tool.nameSupported
Input data loggingArt. 12(3)(c)Content events, request parametersSupported
Human verification trackingArt. 12(3)(d)Custom span attributesExtensible
Log retention (6+ months)Art. 19RETENTION_DAYS configurationConfigurable
Model identificationAnnex XIgen_ai.response.model, tokens.modelSupported
Provider identificationAnnex XIgen_ai.provider.name, gen_ai.systemSupported
Token usage trackingAnnex XItokens.input, tokens.output, gen_ai.usage.*Supported
Cost estimationAnnex XISession cost breakdownSupported

Tool Capabilities Summary

Query Tools

| Tool | EU AI Act Use Case | |——|——————-| | obs_query_traces | Retrieve logged events for compliance audits | | obs_query_logs | Search operational logs by severity/session | | obs_query_metrics | Aggregate usage metrics with percentiles | | obs_query_llm_events | Query LLM-specific events and token usage | | obs_query_evaluations | Query quality evaluation events with aggregations | | obs_context_stats | Session-level context and cost analysis |

Compliance Tools

| Tool | EU AI Act Use Case | |——|——————-| | obs_health_check | Verify telemetry system operational status | | obs_get_trace_url | Generate shareable trace URLs for audits | | obs_setup_claudeignore | Configure retention and exclusion policies |

OTel GenAI Semantic Conventions (v1.8.5)

observability-toolkit implements 10/10 OTel GenAI semantic convention attributes:

AttributeImplementation
gen_ai.operation.namechat, embeddings, invoke_agent, execute_tool
gen_ai.provider.nameFallback chain with gen_ai.system
gen_ai.conversation.idSession correlation
gen_ai.response.modelModel version tracking
gen_ai.response.finish_reasonsCompletion status
gen_ai.request.temperatureRequest parameters
gen_ai.request.max_tokensRequest parameters
gen_ai.tool.nameTool identification
gen_ai.tool.call.idTool invocation tracking
gen_ai.agent.id / gen_ai.agent.nameAgent identification

Backend Support

BackendTracesMetricsLogsNotes
Local JSONLYesYesYesDefault, file-based storage
SigNoz CloudYesYesYesOTLP export supported
LangfusePlannedPlannedN/APhase 4b roadmap

Gaps & Roadmap

GapEU AI Act RelevanceStatus
Evaluation eventsQuality assurance✅ Implemented: obs_query_evaluations
Langfuse exportExternal audit toolsPlanned: Phase 4b OTLP export utility
LLM-as-Judge hooksAutomated evaluationPlanned: Phase 4c webhook integration
Human verification spansArt. 12(3)(d)Extensible via custom span attributes
// Environment variables for EU AI Act compliance
{
  // Retention (Article 19)
  RETENTION_DAYS: 180,              // Minimum 6 months

  // Telemetry paths
  TELEMETRY_DIR: '~/.claude/telemetry',

  // SigNoz integration (optional)
  SIGNOZ_URL: 'https://ingest.us.signoz.cloud',
  SIGNOZ_API_KEY: '<your-key>',

  // Cache settings
  CACHE_TTL_MS: 60000,              // Query cache TTL
}

Conclusion

observability-toolkit v1.8.5 provides substantial coverage for EU AI Act observability requirements:

  • Article 12 (Record-Keeping): Full support for automatic event logging, session tracking, and tool invocation recording
  • Article 19 (Log Retention): Configurable retention with RETENTION_DAYS
  • Annex XI (Technical Documentation): Model, provider, and usage metrics captured automatically

v1.8.5 Security Enhancements (65+ commits since v1.8.0):

  • Circuit breaker for local backend resilience
  • SSRF protection with IPv6 zone ID handling
  • Rate limiter overflow prevention
  • Cloud environment detection warnings
  • ~100 negative security test cases
  • 2083 total tests (up from ~1700)