GPU Backend Integration - Next Step

Date: November 1, 2025
Status: Config wiring complete, backend integration needed
Priority: HIGH

🎯 Current Status

✅ What's Complete:

GPU Backend Implementation (WGPU)
- 1,366 lines of code
- 4 WGSL shaders
- FCL-aware sparse processing
- Complete and functional
Configuration System
- TOML config exists and is parsed
- GpuConfig struct created
- Config passed to NPU initialization
Backend Creation
- Backend is created based on config
- CPU or WGPU selected appropriately
- Logged to console

⚠️ What's Missing:

Backend is created but NOT USED in burst processing!

Current code (npu.rs:663 - process_burst()):

pub fn process_burst(&self) -> Result<BurstResult> {
    // Still uses old CPU code directly:
    let injection_result = phase1_injection_with_synapses(...)?;
    let dynamics_result = process_neural_dynamics(...)?;
    // ❌ Backend is never called!
}

What should happen:

pub fn process_burst(&self) -> Result<BurstResult> {
    // Should use backend abstraction:
    let mut backend = self.backend.lock().unwrap();
    let result = backend.process_burst(...)?;
    // ✅ Backend processes burst (CPU or GPU)
}

📊 The Gap

Backend field exists but is marked #[allow(dead_code)] because it's not integrated yet.

Why this happened:

Backend abstraction was designed as a separate system
Never integrated into the main NPU burst loop
Old CPU code path still in use
Backend is created but sits unused

Impact:

Config works (backend is selected)
But backend is never called
Always uses CPU code path
GPU backend functional but unreachable

🔧 What Needs to Be Done

Task: Integrate Backend into Burst Processing

Estimated Time: 2-3 weeks
Complexity: Medium (refactoring required)

Step 1: Refactor `process_burst()` to Use Backend

Current implementation uses direct function calls:

pub fn process_burst(&self) -> Result<BurstResult> {
    // Phase 1: Synaptic propagation
    let injection_result = phase1_injection_with_synapses(
        &mut fcl,
        &mut neuron_array,
        &mut propagation_engine,
        &previous_fq,
        power,
        &synapse_array,
        &pending_injections,
    )?;
    
    // Phase 2: Neural dynamics
    let dynamics_result = process_neural_dynamics(
        &fcl,
        &mut neuron_array,
        burst_count,
    )?;
}

Should become:

pub fn process_burst(&self) -> Result<BurstResult> {
    let mut backend = self.backend.lock().unwrap();
    let mut fire_structures = self.fire_structures.lock().unwrap();
    let neuron_array = self.neuron_array.read().unwrap();
    let synapse_array = self.synapse_array.read().unwrap();
    
    // Get fired neurons from previous burst
    let fired_neurons = fire_structures.previous_fire_queue.get_all_neuron_ids();
    let fired_u32: Vec<u32> = fired_neurons.iter().map(|id| id.0).collect();
    
    // Use backend to process burst
    let result = backend.process_burst(
        &fired_u32,
        &synapse_array,
        &mut fire_structures.fire_candidate_list,
        &mut neuron_array,
        burst_count,
    )?;
    
    // Build fire queue from result
    // ... rest of processing
}

Step 2: Handle Power Injection

The backend doesn't know about "power neurons" - need to inject them before calling backend:

// Before calling backend, inject power neurons into FCL
for neuron_id in power_neurons {
    fire_structures.fire_candidate_list.add_candidate(neuron_id, power_amount);
}

// Then call backend
let result = backend.process_burst(...)?;

Step 3: Handle Sensory Injection

Similarly, staged sensory injections need to be handled:

// Inject staged sensory data into FCL
for (neuron_id, potential) in &fire_structures.pending_sensory_injections {
    fire_structures.fire_candidate_list.add_candidate(*neuron_id, *potential);
}
fire_structures.pending_sensory_injections.clear();

// Then call backend
let result = backend.process_burst(...)?;

Step 4: Test Both Paths

Test CPU backend path (should work same as before)
Test GPU backend path (verify correctness vs CPU)
Ensure power injection works
Ensure sensory injection works
Ensure fire queue, fire ledger work

⚠️ Current Workaround

For now, I've added #[allow(dead_code)] to the backend field to suppress the warning.

Why: Backend integration into burst processing is a separate task from config wiring.

What works:

✅ Config is parsed correctly
✅ Backend is created (CPU or WGPU)
✅ Backend selection is logged
✅ Feature flags work

What doesn't work:

❌ Backend is not used during burst processing
❌ Still uses old CPU code path
❌ GPU backend is unreachable in practice

📋 Recommended Next Steps

Option A: Quick Integration (2-3 days)

Implement minimal backend integration:

Update process_burst() to call backend.process_burst()
Handle power/sensory injection before backend call
Test that it works

Risk: May break existing functionality
Benefit: GPU actually gets used

Option B: Comprehensive Refactor (2-3 weeks)

Fully integrate backend abstraction:

Refactor burst processing to use backend exclusively
Remove old CPU code paths
Add comprehensive tests (CPU vs GPU correctness)
Validate performance

Risk: Major refactor, needs extensive testing
Benefit: Clean architecture, backend fully functional

Option C: Staged Approach (RECOMMENDED)

Week 1: Minimal integration

Make backend functional (Option A)
Keep old code as fallback
Feature flag to toggle between old/new

Week 2-3: Validation

Test CPU backend (should match old code)
Test GPU backend (compare to CPU)
Performance benchmarking

Week 4+: Full migration

Remove old code paths
Backend becomes primary
Production deployment

🎯 Bottom Line

Config wiring is COMPLETE ✅

Config parsed ✅
Backend created ✅
Logs show selection ✅

Backend integration is INCOMPLETE ⚠️

Backend exists but unused ❌
Burst processing uses old CPU code ❌
GPU path unreachable ❌

Next task: Integrate backend into process_burst() method (2-3 days to 2-3 weeks depending on approach)

📝 Technical Details

Current `process_burst()` Architecture:

RustNPU::process_burst()
    ↓
phase1_injection_with_synapses()  ← Direct CPU code
    ↓
process_neural_dynamics()         ← Direct CPU code
    ↓
archive_burst()
    ↓
sample_fire_queue()

Desired Architecture:

RustNPU::process_burst()
    ↓
backend.process_burst()           ← Backend abstraction
    ├─ CPU path  → process_synaptic_propagation() + process_neural_dynamics()
    └─ GPU path  → GPU shaders (WGSL)
    ↓
archive_burst()
    ↓
sample_fire_queue()

🚀 Recommendation

For immediate use:

Current implementation will compile and run
Backend is selected (logged correctly)
But uses CPU code path only

For GPU to actually work:

Need to integrate backend into process_burst()
Estimated: 2-3 days (minimal) to 2-3 weeks (comprehensive)
Should be next priority after config wiring

Status: Config wiring complete, backend integration is next step
Document: See implementation details in this file
Last Updated: November 1, 2025

🎯 Current Status​

✅ What's Complete:​

⚠️ What's Missing:​

📊 The Gap​

🔧 What Needs to Be Done​

Task: Integrate Backend into Burst Processing​

Step 1: Refactor process_burst() to Use Backend​

Step 2: Handle Power Injection​

Step 3: Handle Sensory Injection​

Step 4: Test Both Paths​

⚠️ Current Workaround​

📋 Recommended Next Steps​

Option A: Quick Integration (2-3 days)​

Option B: Comprehensive Refactor (2-3 weeks)​

Option C: Staged Approach (RECOMMENDED)​

🎯 Bottom Line​

📝 Technical Details​

Current process_burst() Architecture:​

Desired Architecture:​

🚀 Recommendation​