Skip to main content

GPU Backend Integration - Next Step

Date: November 1, 2025
Status: Config wiring complete, backend integration needed
Priority: HIGH


🎯 Current Status

✅ What's Complete:

  1. GPU Backend Implementation (WGPU)

    • 1,366 lines of code
    • 4 WGSL shaders
    • FCL-aware sparse processing
    • Complete and functional
  2. Configuration System

    • TOML config exists and is parsed
    • GpuConfig struct created
    • Config passed to NPU initialization
  3. Backend Creation

    • Backend is created based on config
    • CPU or WGPU selected appropriately
    • Logged to console

⚠️ What's Missing:

Backend is created but NOT USED in burst processing!

Current code (npu.rs:663 - process_burst()):

pub fn process_burst(&self) -> Result<BurstResult> {
// Still uses old CPU code directly:
let injection_result = phase1_injection_with_synapses(...)?;
let dynamics_result = process_neural_dynamics(...)?;
// ❌ Backend is never called!
}

What should happen:

pub fn process_burst(&self) -> Result<BurstResult> {
// Should use backend abstraction:
let mut backend = self.backend.lock().unwrap();
let result = backend.process_burst(...)?;
// ✅ Backend processes burst (CPU or GPU)
}

📊 The Gap

Backend field exists but is marked #[allow(dead_code)] because it's not integrated yet.

Why this happened:

  • Backend abstraction was designed as a separate system
  • Never integrated into the main NPU burst loop
  • Old CPU code path still in use
  • Backend is created but sits unused

Impact:

  • Config works (backend is selected)
  • But backend is never called
  • Always uses CPU code path
  • GPU backend functional but unreachable

🔧 What Needs to Be Done

Task: Integrate Backend into Burst Processing

Estimated Time: 2-3 weeks
Complexity: Medium (refactoring required)

Step 1: Refactor process_burst() to Use Backend

Current implementation uses direct function calls:

pub fn process_burst(&self) -> Result<BurstResult> {
// Phase 1: Synaptic propagation
let injection_result = phase1_injection_with_synapses(
&mut fcl,
&mut neuron_array,
&mut propagation_engine,
&previous_fq,
power,
&synapse_array,
&pending_injections,
)?;

// Phase 2: Neural dynamics
let dynamics_result = process_neural_dynamics(
&fcl,
&mut neuron_array,
burst_count,
)?;
}

Should become:

pub fn process_burst(&self) -> Result<BurstResult> {
let mut backend = self.backend.lock().unwrap();
let mut fire_structures = self.fire_structures.lock().unwrap();
let neuron_array = self.neuron_array.read().unwrap();
let synapse_array = self.synapse_array.read().unwrap();

// Get fired neurons from previous burst
let fired_neurons = fire_structures.previous_fire_queue.get_all_neuron_ids();
let fired_u32: Vec<u32> = fired_neurons.iter().map(|id| id.0).collect();

// Use backend to process burst
let result = backend.process_burst(
&fired_u32,
&synapse_array,
&mut fire_structures.fire_candidate_list,
&mut neuron_array,
burst_count,
)?;

// Build fire queue from result
// ... rest of processing
}

Step 2: Handle Power Injection

The backend doesn't know about "power neurons" - need to inject them before calling backend:

// Before calling backend, inject power neurons into FCL
for neuron_id in power_neurons {
fire_structures.fire_candidate_list.add_candidate(neuron_id, power_amount);
}

// Then call backend
let result = backend.process_burst(...)?;

Step 3: Handle Sensory Injection

Similarly, staged sensory injections need to be handled:

// Inject staged sensory data into FCL
for (neuron_id, potential) in &fire_structures.pending_sensory_injections {
fire_structures.fire_candidate_list.add_candidate(*neuron_id, *potential);
}
fire_structures.pending_sensory_injections.clear();

// Then call backend
let result = backend.process_burst(...)?;

Step 4: Test Both Paths

  • Test CPU backend path (should work same as before)
  • Test GPU backend path (verify correctness vs CPU)
  • Ensure power injection works
  • Ensure sensory injection works
  • Ensure fire queue, fire ledger work

⚠️ Current Workaround

For now, I've added #[allow(dead_code)] to the backend field to suppress the warning.

Why: Backend integration into burst processing is a separate task from config wiring.

What works:

  • ✅ Config is parsed correctly
  • ✅ Backend is created (CPU or WGPU)
  • ✅ Backend selection is logged
  • ✅ Feature flags work

What doesn't work:

  • ❌ Backend is not used during burst processing
  • ❌ Still uses old CPU code path
  • ❌ GPU backend is unreachable in practice

Option A: Quick Integration (2-3 days)

Implement minimal backend integration:

  • Update process_burst() to call backend.process_burst()
  • Handle power/sensory injection before backend call
  • Test that it works

Risk: May break existing functionality
Benefit: GPU actually gets used


Option B: Comprehensive Refactor (2-3 weeks)

Fully integrate backend abstraction:

  • Refactor burst processing to use backend exclusively
  • Remove old CPU code paths
  • Add comprehensive tests (CPU vs GPU correctness)
  • Validate performance

Risk: Major refactor, needs extensive testing
Benefit: Clean architecture, backend fully functional


Week 1: Minimal integration

  • Make backend functional (Option A)
  • Keep old code as fallback
  • Feature flag to toggle between old/new

Week 2-3: Validation

  • Test CPU backend (should match old code)
  • Test GPU backend (compare to CPU)
  • Performance benchmarking

Week 4+: Full migration

  • Remove old code paths
  • Backend becomes primary
  • Production deployment

🎯 Bottom Line

Config wiring is COMPLETE

  • Config parsed ✅
  • Backend created ✅
  • Logs show selection ✅

Backend integration is INCOMPLETE ⚠️

  • Backend exists but unused ❌
  • Burst processing uses old CPU code ❌
  • GPU path unreachable ❌

Next task: Integrate backend into process_burst() method (2-3 days to 2-3 weeks depending on approach)


📝 Technical Details

Current process_burst() Architecture:

RustNPU::process_burst()

phase1_injection_with_synapses() ← Direct CPU code

process_neural_dynamics() ← Direct CPU code

archive_burst()

sample_fire_queue()

Desired Architecture:

RustNPU::process_burst()

backend.process_burst() ← Backend abstraction
├─ CPU path → process_synaptic_propagation() + process_neural_dynamics()
└─ GPU path → GPU shaders (WGSL)

archive_burst()

sample_fire_queue()

🚀 Recommendation

For immediate use:

  • Current implementation will compile and run
  • Backend is selected (logged correctly)
  • But uses CPU code path only

For GPU to actually work:

  • Need to integrate backend into process_burst()
  • Estimated: 2-3 days (minimal) to 2-3 weeks (comprehensive)
  • Should be next priority after config wiring

Status: Config wiring complete, backend integration is next step
Document: See implementation details in this file
Last Updated: November 1, 2025