GPU Backend Integration - Next Step
Date: November 1, 2025
Status: Config wiring complete, backend integration needed
Priority: HIGH
🎯 Current Status
✅ What's Complete:
-
GPU Backend Implementation (WGPU)
- 1,366 lines of code
- 4 WGSL shaders
- FCL-aware sparse processing
- Complete and functional
-
Configuration System
- TOML config exists and is parsed
GpuConfigstruct created- Config passed to NPU initialization
-
Backend Creation
- Backend is created based on config
- CPU or WGPU selected appropriately
- Logged to console
⚠️ What's Missing:
Backend is created but NOT USED in burst processing!
Current code (npu.rs:663 - process_burst()):
pub fn process_burst(&self) -> Result<BurstResult> {
// Still uses old CPU code directly:
let injection_result = phase1_injection_with_synapses(...)?;
let dynamics_result = process_neural_dynamics(...)?;
// ❌ Backend is never called!
}
What should happen:
pub fn process_burst(&self) -> Result<BurstResult> {
// Should use backend abstraction:
let mut backend = self.backend.lock().unwrap();
let result = backend.process_burst(...)?;
// ✅ Backend processes burst (CPU or GPU)
}
📊 The Gap
Backend field exists but is marked #[allow(dead_code)] because it's not integrated yet.
Why this happened:
- Backend abstraction was designed as a separate system
- Never integrated into the main NPU burst loop
- Old CPU code path still in use
- Backend is created but sits unused
Impact:
- Config works (backend is selected)
- But backend is never called
- Always uses CPU code path
- GPU backend functional but unreachable
🔧 What Needs to Be Done
Task: Integrate Backend into Burst Processing
Estimated Time: 2-3 weeks
Complexity: Medium (refactoring required)
Step 1: Refactor process_burst() to Use Backend
Current implementation uses direct function calls:
pub fn process_burst(&self) -> Result<BurstResult> {
// Phase 1: Synaptic propagation
let injection_result = phase1_injection_with_synapses(
&mut fcl,
&mut neuron_array,
&mut propagation_engine,
&previous_fq,
power,
&synapse_array,
&pending_injections,
)?;
// Phase 2: Neural dynamics
let dynamics_result = process_neural_dynamics(
&fcl,
&mut neuron_array,
burst_count,
)?;
}
Should become:
pub fn process_burst(&self) -> Result<BurstResult> {
let mut backend = self.backend.lock().unwrap();
let mut fire_structures = self.fire_structures.lock().unwrap();
let neuron_array = self.neuron_array.read().unwrap();
let synapse_array = self.synapse_array.read().unwrap();
// Get fired neurons from previous burst
let fired_neurons = fire_structures.previous_fire_queue.get_all_neuron_ids();
let fired_u32: Vec<u32> = fired_neurons.iter().map(|id| id.0).collect();
// Use backend to process burst
let result = backend.process_burst(
&fired_u32,
&synapse_array,
&mut fire_structures.fire_candidate_list,
&mut neuron_array,
burst_count,
)?;
// Build fire queue from result
// ... rest of processing
}
Step 2: Handle Power Injection
The backend doesn't know about "power neurons" - need to inject them before calling backend:
// Before calling backend, inject power neurons into FCL
for neuron_id in power_neurons {
fire_structures.fire_candidate_list.add_candidate(neuron_id, power_amount);
}
// Then call backend
let result = backend.process_burst(...)?;
Step 3: Handle Sensory Injection
Similarly, staged sensory injections need to be handled:
// Inject staged sensory data into FCL
for (neuron_id, potential) in &fire_structures.pending_sensory_injections {
fire_structures.fire_candidate_list.add_candidate(*neuron_id, *potential);
}
fire_structures.pending_sensory_injections.clear();
// Then call backend
let result = backend.process_burst(...)?;
Step 4: Test Both Paths
- Test CPU backend path (should work same as before)
- Test GPU backend path (verify correctness vs CPU)
- Ensure power injection works
- Ensure sensory injection works
- Ensure fire queue, fire ledger work
⚠️ Current Workaround
For now, I've added #[allow(dead_code)] to the backend field to suppress the warning.
Why: Backend integration into burst processing is a separate task from config wiring.
What works:
- ✅ Config is parsed correctly
- ✅ Backend is created (CPU or WGPU)
- ✅ Backend selection is logged
- ✅ Feature flags work
What doesn't work:
- ❌ Backend is not used during burst processing
- ❌ Still uses old CPU code path
- ❌ GPU backend is unreachable in practice
📋 Recommended Next Steps
Option A: Quick Integration (2-3 days)
Implement minimal backend integration:
- Update
process_burst()to callbackend.process_burst() - Handle power/sensory injection before backend call
- Test that it works
Risk: May break existing functionality
Benefit: GPU actually gets used
Option B: Comprehensive Refactor (2-3 weeks)
Fully integrate backend abstraction:
- Refactor burst processing to use backend exclusively
- Remove old CPU code paths
- Add comprehensive tests (CPU vs GPU correctness)
- Validate performance
Risk: Major refactor, needs extensive testing
Benefit: Clean architecture, backend fully functional
Option C: Staged Approach (RECOMMENDED)
Week 1: Minimal integration
- Make backend functional (Option A)
- Keep old code as fallback
- Feature flag to toggle between old/new
Week 2-3: Validation
- Test CPU backend (should match old code)
- Test GPU backend (compare to CPU)
- Performance benchmarking
Week 4+: Full migration
- Remove old code paths
- Backend becomes primary
- Production deployment
🎯 Bottom Line
Config wiring is COMPLETE ✅
- Config parsed ✅
- Backend created ✅
- Logs show selection ✅
Backend integration is INCOMPLETE ⚠️
- Backend exists but unused ❌
- Burst processing uses old CPU code ❌
- GPU path unreachable ❌
Next task: Integrate backend into process_burst() method (2-3 days to 2-3 weeks depending on approach)
📝 Technical Details
Current process_burst() Architecture:
RustNPU::process_burst()
↓
phase1_injection_with_synapses() ← Direct CPU code
↓
process_neural_dynamics() ← Direct CPU code
↓
archive_burst()
↓
sample_fire_queue()
Desired Architecture:
RustNPU::process_burst()
↓
backend.process_burst() ← Backend abstraction
├─ CPU path → process_synaptic_propagation() + process_neural_dynamics()
└─ GPU path → GPU shaders (WGSL)
↓
archive_burst()
↓
sample_fire_queue()
🚀 Recommendation
For immediate use:
- Current implementation will compile and run
- Backend is selected (logged correctly)
- But uses CPU code path only
For GPU to actually work:
- Need to integrate backend into
process_burst() - Estimated: 2-3 days (minimal) to 2-3 weeks (comprehensive)
- Should be next priority after config wiring
Status: Config wiring complete, backend integration is next step
Document: See implementation details in this file
Last Updated: November 1, 2025