naxodev
OSS
naxodev/mar
refactor: client/voice (audio playback TTS) (#16) ## PR Type What kind of change does this PR introduce? <!-- Please check the one that applies to this PR using "x". --> ``` [ ] Bugfix [ ] Feature [ ] Code style update (formatting, local variables) [x] Refactoring (no functional changes, no api changes) [ ] Build related changes [ ] CI related changes [ ] Documentation content changes [ ] Other... Please describe: ``` ## What is the current behavior? The voice functionality is currently tightly coupled in the React layer with: - Audio capture logic embedded in `use-maelstrom-voice.ts` (using inline AudioWorklet code) - Audio playback handling mixed with state management - VAD logic partially in React hooks - Complex orchestration spread across multiple files - Limited test coverage for audio operations ## What is the new behavior? This PR refactors the voice architecture by extracting concerns into dedicated packages: 1. **`audio-capture`**: New standalone package with `AudioCapture` class handling microphone access, AudioWorklet management, and audio streaming 2. **`audio-playback`**: New standalone package with `AudioPlayback` class for TTS audio queue management and scheduling 3. **`voice-session`**: New package containing `VoiceSession` class that orchestrates STT/TTS flow, VAD integration, and state management 4. **Simplified React hooks**: `use-maelstrom-voice.ts` reduced from ~800 lines to ~200 lines by delegating to `VoiceSession` 5. **Improved VAD**: Better Silero VAD integration with configurable timeout options 6. **Enhanced testing**: Added comprehensive test suites for audio-capture (328 lines), audio-playback (194 lines), and voice-session (1238 lines) 7. **Better concurrency**: Fixed race conditions in audio playback and capture 8. **Code organization**: Moved all audio processing logic from React layer to client library API remains unchanged for consumers of the React hook. ## Does this PR introduce a breaking change? ``` [ ] Yes [x] No ``` <!-- If this PR contains a breaking change, please describe the impact and migration path for existing applications below. --> ## Other information - Removed ~700 lines from `use-maelstrom-voice.ts` and ~777 lines from `voice-connector.ts` - Total net addition of ~1,300 lines (mostly new tests and documentation) - All existing voice functionality preserved - Improves maintainability by separating audio concerns from React UI logic - Enables reuse of voice logic in non-React contexts
nx run-many -t e2e
Sign in / Sign up
Open main menu
Succeeded
CI Pipeline Execution
nx run-many -t e2e
Click to copy
Linux
2 CPU cores
read-write
access token used
ddbd47eb
main