Unified Observability Architecture: Profiling, Logging, and Telemetry
Status: Architecture Extension
Date: 2025-10-31
Author: FEAGI Architecture Team
Executive Summary
This document extends the feagi-observability crate proposal to include profiling and telemetry alongside logging, creating a unified observability infrastructure. Profiling, logging, and telemetry share common context, correlation IDs, and initialization patterns, making them natural partners in a single crate.
Why Unified Infrastructure?
1. Shared Context and Correlation
Problem: Logs, traces, metrics, and profiles are often disconnected, making it hard to correlate.
Solution: Unified correlation IDs propagate across all observability systems:
// Same correlation ID used for:
// - Logs: "request_id=abc123"
// - Traces: Span with trace_id="abc123"
// - Metrics: Label request_id="abc123"
// - Profiles: Profile metadata includes request_id="abc123"
Benefit: Can trace a request from log → trace → metric → profile seamlessly.
2. Unified Initialization
Problem: Multiple initialization calls scattered across codebase.
Solution: Single initialization function:
use feagi_observability::init_observability;
init_observability(&ObservabilityConfig {
logging: LoggingConfig { level: "info", format: LogFormat::Json },
telemetry: TelemetryConfig {
metrics_enabled: true,
tracing_enabled: true,
tracing_endpoint: Some("http://jaeger:4317".to_string()),
},
profiling: ProfilingConfig {
cpu_profiling: true,
memory_profiling: false,
output_dir: "./profiles".into(),
},
})?;
Benefit: One place to configure all observability, ensures consistency.
3. Shared Data Collection
Problem: Each system (logging, metrics, tracing) collects similar data separately.
Solution: Unified collection layer:
// Single instrumentation point collects:
// - Logs (structured)
// - Metrics (counters/histograms)
// - Traces (spans)
// - Profiling samples (if enabled)
#[instrument]
pub async fn execute_burst() {
// Automatically creates:
// - Log entry
// - Trace span
// - Metrics counter
// - Profiling sample (if enabled)
}
Benefit: Less overhead, consistent data collection.
4. Consistent Patterns
Problem: Different APIs for logging vs metrics vs tracing.
Solution: Unified macros:
// Same pattern for all observability
feagi_observability::burst_info!(
burst_id = 42,
neurons_fired = 1000,
// Automatically creates:
// - Log entry
// - Metric update
// - Trace span
// - Profiling sample (if enabled)
);
Benefit: Developers learn one API, not three.
Architecture: Unified Observability
┌─────────────────────────────────────────────────────────────┐
│ feagi-observability │
│ │
│ ┌─────── ───────────────────────────────────────────────┐ │
│ │ Context Layer (Correlation IDs, Request Context) │ │
│ │ - Trace ID propagation │ │
│ │ - Span context │ │
│ │ - Request correlation │ │
│ └──────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Instrumentation Layer (Unified Collection) │ │
│ │ - #[instrument] macro │ │
│ │ - Structured logging macros │ │
│ │ - Metrics macros │ │
│ │ - Profiling hooks │ │
│ └──────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ LOGGING │ │ TELEMETRY │ │ PROFILING │ │
│ │ │ │ │ │ │ │
│ │ - Structured │ │ - Metrics │ │ - CPU │ │
│ │ - Spans │ │ - Traces │ │ - Memory │ │
│ │ - Context │ │ - Health │ │ - Flamegraph │ │
│ │ │ │ │ │ │ │
│ └──────────── ──┘ └──────────────┘ └──────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Export Layer (Backends) │ │
│ │ - Logs → stdout, file, syslog, ELK │ │
│ │ - Metrics → Prometheus │ │
│ │ - Traces → Jaeger, Zipkin │ │
│ │ - Profiles → Chrome DevTools, perf, flamegraph │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Implementation: Profiling Module
Location: feagi-observability/src/profiling.rs
//! CPU and memory profiling for FEAGI
//!
//! Integrates with tracing for zero-overhead profiling when disabled.
use std::path::PathBuf;
use tracing::Instrument;
/// Profiling configuration
#[derive(Debug, Clone)]
pub struct ProfilingConfig {
/// Enable CPU profiling
pub cpu_profiling: bool,
/// Enable memory profiling
pub memory_profiling: bool,
/// Output directory for profiles
pub output_dir: PathBuf,
/// Profiling sample rate (0.0-1.0)
pub sample_rate: f64,
/// Enable Chrome DevTools tracing
pub chrome_tracing: bool,
/// Enable perf profiling (Linux only)
pub perf_profiling: bool,
}
impl Default for ProfilingConfig {
fn default() -> Self {
ProfilingConfig {
cpu_profiling: false,
memory_profiling: false,
output_dir: PathBuf::from("./profiles"),
sample_rate: 1.0,
chrome_tracing: false,
perf_profiling: false,
}
}
}
/// CPU profiler using tracing-chrome
pub struct CpuProfiler {
#[cfg(feature = "chrome-tracing")]
chrome_layer: Option<tracing_chrome::ChromeLayerBuilder>,
output_path: PathBuf,
}
impl CpuProfiler {
/// Create a new CPU profiler
pub fn new(config: &ProfilingConfig) -> Result<Self, Box<dyn std::error::Error>> {
let output_path = config.output_dir.join("trace.json");
#[cfg(feature = "chrome-tracing")]
let chrome_layer = if config.chrome_tracing {
Some(tracing_chrome::ChromeLayerBuilder::new()
.file(&output_path)
.build())
} else {
None
};
#[cfg(not(feature = "chrome-tracing"))]
let chrome_layer = None;
Ok(CpuProfiler {
chrome_layer,
output_path,
})
}
/// Start profiling
pub fn start(&mut self) -> Result<(), Box<dyn std::error::Error>> {
// Implementation
Ok(())
}
/// Stop profiling and save
pub fn stop(&mut self) -> Result<PathBuf, Box<dyn std::error::Error>> {
// Implementation
Ok(self.output_path.clone())
}
}
/// Memory profiler
pub struct MemoryProfiler {
enabled: bool,
samples: Vec<MemorySample>,
}
#[derive(Debug, Clone)]
pub struct MemorySample {
pub timestamp: chrono::DateTime<chrono::Utc>,
pub heap_size: usize,
pub allocations: usize,
pub deallocations: usize,
}
impl MemoryProfiler {
pub fn new(enabled: bool) -> Self {
MemoryProfiler {
enabled,
samples: Vec::new(),
}
}
pub fn sample(&mut self) {
if !self.enabled {
return;
}
// Sample memory usage
// Implementation would use heaptrack or similar
}
pub fn generate_report(&self) -> MemoryReport {
MemoryReport {
samples: self.samples.clone(),
peak_memory: self.samples.iter().map(|s| s.heap_size).max().unwrap_or(0),
total_allocations: self.samples.iter().map(|s| s.allocations).sum(),
}
}
}
#[derive(Debug)]
pub struct MemoryReport {
pub samples: Vec<MemorySample>,
pub peak_memory: usize,
pub total_allocations: usize,
}
/// Flamegraph generator
pub struct FlamegraphGenerator;
impl FlamegraphGenerator {
/// Generate flamegraph from trace
pub fn generate(&self, trace_path: &PathBuf) -> Result<PathBuf, Box<dyn std::error::Error>> {
// Implementation would use inferno or similar
// For now, return input path
Ok(trace_path.clone())
}
}
/// Instrument a function for profiling
#[macro_export]
macro_rules! profile {
($name:expr, $code:block) => {
{
let _guard = tracing::span!(tracing::Level::INFO, "profile", name = $name).entered();
$code
}
};
}
Implementation: Telemetry Module
Location: feagi-observability/src/telemetry.rs
//! Unified telemetry collection for FEAGI
//!
//! Combines metrics, traces, and health checks into a single interface.
use crate::metrics::*;
use crate::tracing::*;
use std::sync::Arc;
/// Telemetry configuration
#[derive(Debug, Clone)]
pub struct TelemetryConfig {
/// Enable Prometheus metrics
pub metrics_enabled: bool,
/// Enable distributed tracing
pub tracing_enabled: bool,
/// Tracing endpoint (Jaeger OTLP)
pub tracing_endpoint: Option<String>,
/// Metrics endpoint path
pub metrics_path: String,
/// Health check endpoint
pub health_check_path: String,
/// System metrics collection interval (seconds)
pub system_metrics_interval: u64,
}
impl Default for TelemetryConfig {
fn default() -> Self {
TelemetryConfig {
metrics_enabled: true,
tracing_enabled: false,
tracing_endpoint: None,
metrics_path: "/metrics".to_string(),
health_check_path: "/health".to_string(),
system_metrics_interval: 5,
}
}
}
/// Unified telemetry collector
pub struct TelemetryCollector {
metrics_registry: Arc<prometheus::Registry>,
tracer: Option<opentelemetry::sdk::trace::Tracer>,
health_status: Arc<std::sync::RwLock<HealthStatus>>,
}
#[derive(Debug, Clone, serde::Serialize)]
pub struct HealthStatus {
pub status: String,
pub checks: Vec<HealthCheck>,
pub timestamp: chrono::DateTime<chrono::Utc>,
}
#[derive(Debug, Clone, serde::Serialize)]
pub struct HealthCheck {
pub name: String,
pub status: String,
pub message: String,
}
impl TelemetryCollector {
/// Create a new telemetry collector
pub fn new(config: &TelemetryConfig) -> Result<Self, Box<dyn std::error::Error>> {
let metrics_registry = Arc::new(prometheus::Registry::new());
// Register all metrics
crate::metrics::register_all_metrics(&metrics_registry);
// Initialize tracer if enabled
let tracer = if config.tracing_enabled {
Some(init_tracer(config.tracing_endpoint.as_ref())?)
} else {
None
};
Ok(TelemetryCollector {
metrics_registry,
tracer,
health_status: Arc::new(std::sync::RwLock::new(HealthStatus {
status: "healthy".to_string(),
checks: Vec::new(),
timestamp: chrono::Utc::now(),
})),
})
}
/// Get metrics registry
pub fn metrics_registry(&self) -> &Arc<prometheus::Registry> {
&self.metrics_registry
}
/// Export metrics as Prometheus text format
pub fn export_metrics(&self) -> String {
use prometheus::Encoder;
let encoder = prometheus::TextEncoder::new();
let mut buffer = Vec::new();
encoder.encode(&self.metrics_registry.gather(), &mut buffer).unwrap();
String::from_utf8(buffer).unwrap()
}
/// Update health status
pub fn update_health(&self, check: HealthCheck) {
let mut status = self.health_status.write().unwrap();
// Update or add check
if let Some(existing) = status.checks.iter_mut().find(|c| c.name == check.name) {
*existing = check.clone();
} else {
status.checks.push(check);
}
// Update overall status
status.status = if status.checks.iter().all(|c| c.status == "healthy") {
"healthy".to_string()
} else {
"degraded".to_string()
};
status.timestamp = chrono::Utc::now();
}
/// Get health status
pub fn get_health(&self) -> HealthStatus {
self.health_status.read().unwrap().clone()
}
}
/// Unified instrumentation macro
/// Creates log entry, trace span, and metric update
#[macro_export]
macro_rules! instrument {
($name:expr, $($field:ident = $value:expr),* $(,)?) => {
{
// Create trace span
let span = tracing::span!(
tracing::Level::INFO,
$name,
$($field = $value),*
);
let _guard = span.entered();
// Log entry
tracing::info!(
$($field = $value),*,
"{}", $name
);
// Metric update (if applicable)
// This would be context-aware based on the operation
// Profiling sample (if enabled)
// This would be handled by the profiling layer
}
};
}
Implementation: Unified Initialization
Location: feagi-observability/src/init.rs
//! Unified initialization for all observability systems
use crate::logging::*;
use crate::telemetry::*;
use crate::profiling::*;
use crate::tracing::*;
use std::sync::Arc;
/// Unified observability configuration
#[derive(Debug, Clone)]
pub struct ObservabilityConfig {
pub logging: LoggingConfig,
pub telemetry: TelemetryConfig,
pub profiling: ProfilingConfig,
}
/// Unified observability manager
pub struct ObservabilityManager {
pub telemetry: Arc<TelemetryCollector>,
pub profiler: Option<CpuProfiler>,
pub memory_profiler: Option<MemoryProfiler>,
}
impl ObservabilityManager {
/// Initialize all observability systems
pub fn init(config: &ObservabilityConfig) -> Result<Self, Box<dyn std::error::Error>> {
// Initialize logging
init_logging(&config.logging)?;
// Initialize telemetry
let telemetry = Arc::new(TelemetryCollector::new(&config.telemetry)?);
// Initialize profiling
let profiler = if config.profiling.cpu_profiling {
Some(CpuProfiler::new(&config.profiling)?)
} else {
None
};
let memory_profiler = if config.profiling.memory_profiling {
Some(MemoryProfiler::new(true))
} else {
None
};
Ok(ObservabilityManager {
telemetry,
profiler,
memory_profiler,
})
}
/// Get telemetry collector
pub fn telemetry(&self) -> &Arc<TelemetryCollector> {
&self.telemetry
}
/// Start profiling
pub fn start_profiling(&mut self) -> Result<(), Box<dyn std::error::Error>> {
if let Some(ref mut profiler) = self.profiler {
profiler.start()?;
}
Ok(())
}
/// Stop profiling and save
pub fn stop_profiling(&mut self) -> Result<Option<PathBuf>, Box<dyn std::error::Error>> {
if let Some(ref mut profiler) = self.profiler {
Ok(Some(profiler.stop()?))
} else {
Ok(None)
}
}
}
/// Convenience function to initialize all observability
pub fn init_observability(config: &ObservabilityConfig) -> Result<ObservabilityManager, Box<dyn std::error::Error>> {
ObservabilityManager::init(config)
}
Updated Cargo.toml
[package]
name = "feagi-observability"
version.workspace = true
edition.workspace = true
authors.workspace = true
license.workspace = true
description = "Unified observability infrastructure for FEAGI (logging, telemetry, profiling)"
[dependencies]
# Logging
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
# Telemetry
prometheus = "0.13"
opentelemetry = { version = "0.21", optional = true }
opentelemetry-sdk = { version = "0.21", optional = true }
opentelemetry-otlp = { version = "0.14", optional = true }
tracing-opentelemetry = { version = "0.21", optional = true }
# Profiling
tracing-chrome = { version = "0.6", optional = true }
pprof = { version = "0.12", optional = true }
# Errors
anyhow = "1.0"
thiserror.workspace = true
# Utilities
chrono = "0.4"
serde = { version = "1.0", features = ["derive"] }
[features]
default = []
opentelemetry = [
"opentelemetry",
"opentelemetry-sdk",
"opentelemetry-otlp",
"tracing-opentelemetry"
]
profiling = ["tracing-chrome", "pprof"]
Usage Example: Unified Observability
use feagi_observability::{
init_observability, ObservabilityConfig,
LoggingConfig, LogFormat,
TelemetryConfig,
ProfilingConfig,
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize all observability systems at once
let observability = init_observability(&ObservabilityConfig {
logging: LoggingConfig {
level: "info".to_string(),
format: LogFormat::Json,
},
telemetry: TelemetryConfig {
metrics_enabled: true,
tracing_enabled: true,
tracing_endpoint: Some("http://jaeger:4317".to_string()),
..Default::default()
},
profiling: ProfilingConfig {
cpu_profiling: std::env::var("ENABLE_PROFILING").is_ok(),
chrome_tracing: std::env::var("CHROME_TRACING").is_ok(),
..Default::default()
},
})?;
// Use unified macros
use feagi_observability::{burst_info, instrument};
// This automatically creates:
// - Log entry
// - Trace span
// - Metric update
// - Profiling sample (if enabled)
burst_info!(
burst_id = 42,
neurons_fired = 1000,
synapses_activated = 5000
);
// Start profiling for specific operation
observability.start_profiling()?;
// ... perform operation ...
// Stop profiling
let profile_path = observability.stop_profiling()?;
if let Some(path) = profile_path {
println!("Profile saved to: {}", path.display());
}
Ok(())
}
Benefits of Unified Infrastructure
1. Consistency
- ✅ Same correlation IDs across logs, traces, metrics, profiles
- ✅ Unified initialization pattern
- ✅ Consistent API design
2. Performance
- ✅ Shared infrastructure reduces overhead
- ✅ Zero-cost when disabled (compile-time)
- ✅ Efficient data collection
3. Developer Experience
- ✅ Learn one API, not three
- ✅ Single initialization call
- ✅ Clear examples and documentation
4. Correlation
- ✅ Can trace from log → trace → metric → profile
- ✅ Same context propagated everywhere
- ✅ Unified debugging experience
5. Maintainability
- ✅ Single crate to update
- ✅ Consistent patterns across codebase
- ✅ Easier to add new observability features
Migration Path
- Create
feagi-observabilitywith logging, telemetry, and profiling modules - Implement unified initialization (
init_observability) - Migrate crates to use unified APIs
- Add profiling to performance-critical paths
- Enable telemetry in production deployments
Conclusion
Yes, profiling and telemetry should share infrastructure with logging.
They benefit from:
- Shared correlation IDs
- Unified initialization
- Consistent patterns
- Better performance
- Easier maintenance
The unified feagi-observability crate provides all three in a single, consistent API.