Overview - Ai

ai

7 days30 days

Latest CI Pipeline Executions

Fuzzy

Succeeded
main
ff338557 docs: regenerate API documentation (#507) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
by github-act...
Succeeded
main
ff338557 docs: regenerate API documentation (#507) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
by github-act...
Failed
main
f6a94b56 ci: Version Packages (#498) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
by github-act...
Failed
main
f6a94b56 Merge 23ad8e0dae7a7f4e1df35771bb405acbdd081c00 into af19dcc69dbde2535d4016d64fbefc891198ed71
by Tim Raders...
Succeeded
main
af19dcc6 ci: Version Packages (#498) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
by github-act...
Succeeded
main
af19dcc6 ci: Version Packages (#498) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
by github-act...
Failed
main
cb6be753 feat(ai-grok): audio, speech, and realtime adapters + example wiring (#506) * feat(ai-grok): add audio and speech adapters for xAI Add `grokSpeech` (TTS via /v1/tts), `grokTranscription` (STT via /v1/stt), and `grokRealtime` + `grokRealtimeToken` (Voice Agent via /v1/realtime) because xAI's standalone audio APIs were shipped publicly and the adapter previously exposed only text/image/summarize. The TTS/STT endpoints are not OpenAI-compatible so these adapters use direct fetch rather than the OpenAI SDK; the realtime API mirrors OpenAI's shape with URL/provider swaps. E2E coverage is wired via mock.mount('/v1/tts'...) on aimock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge from upstream * feat(ai-grok): wire shared debug logger into audio and realtime adapters Adopt the @tanstack/ai/adapter-internals logger across grokSpeech, grokTranscription, grokRealtimeToken, and grokRealtime so users can toggle debug output the same way they do on other adapters — `debug: true` for full tracing, `debug: false` to silence, or a DebugConfig for per-category control and a custom Logger. Replaces the remaining console.error / console.warn calls in the realtime adapter with logger.errors so nothing is lost when debugging is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): correct super() arg order in audio adapters The transcription and TTS adapters were calling super(config, model), but BaseTranscriptionAdapter/BaseTTSAdapter expect (model, config), causing TS2345 build errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai-grok): pass logger to audio adapter tests After the logger was wired into the audio adapters, the unit tests need to provide one when calling transcribe/generateSpeech directly (activities normally inject it via resolveDebugOption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai-grok): route audio adapter tests through core functions Per project convention, tests should not invoke adapter methods directly — they call generateSpeech()/generateTranscription() with the adapter instance, so the core function injects logger, emits events, and exercises the real public surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): address cr-loop round 1 findings ai-grok realtime adapter: - cleanup pc/localStream/audioContext/dataChannel on connect() failure - dataChannelReady rejects on error/close/ICE-failed/timeout - RTCErrorEvent extracted properly instead of [object Event] - onmessage parse errors emit to consumers - input_audio_transcription no longer overrides caller on every update - response.done preserves idle mode after stopAudioCapture - setupOutputAudioAnalysis disposes prior audioElement, surfaces autoplay blocks - audioContext.resume failures emit error instead of silent swallow - currentMessageId reset on response.created (tool-only turns) - pc.onconnectionstatechange / oniceconnectionstatechange emit status_change - sendImage uses object image_url for OpenAI-realtime compatibility - unknown server events logged via default branch ai-grok TTS/STT: - getContentType returns audio/L16 for pcm (valid IANA MIME) - toAudioFile requires explicit audio_format for bare base64 - transcription option renamed format -> inverse_text_normalization ai-grok realtime token: - expires_at unit-safety guard (seconds vs ms) ai-grok types: - single source of truth for GrokRealtimeModel (model-meta) ai-grok tests: - cover aac/flac in pickCodec test - normalize header assertions via Headers() - add realtime-token unit-safety tests examples/ts-react-chat: - resolveModel fails loud via InvalidModelOverrideError (no silent fallback) - audio/speech/transcribe routes return 400 with structured body testing/e2e: - media-providers uses valid grok-2-image-1212 model - test-matrix imports from feature-support (dedupe) * fix(ai-grok): address cr-loop round 2 confirmation findings ai-grok realtime adapter: - shared teardownConnection() helper runs on SDP and post-SDP failure paths, disposing input/output analysers, audio element, mic, data channel, pc, and audio context - pre-open dataChannelReady rejection on failed/closed/disconnected pc states - pc.onconnectionstatechange is sole source of status_change (ice handler only rejects) - sendImage detects data: prefix (no more double-wrap) ai-grok audio utils: - malformed data: URI MIME parse throws instead of silently defaulting to audio/mpeg - empty/missing base64 payload throws - explicit audioFormat argument wins over URI-embedded MIME ai-grok TTS: - audio/L16 content-type includes required rate= parameter from modelOptions.sample_rate ai-grok tests: - realtime-token afterEach restores original XAI_API_KEY - new coverage for malformed data URIs, audioFormat precedence, and rate= in audio/L16 examples/ts-react-chat: - new UnknownProviderError typed class, 400 mapping in audio/speech/transcribe routes - server-fns ServerFnError wraps typed adapter errors with stable code/details * fix(ai-grok): address cr-loop round 3 confirmation findings examples/ts-react-chat: - generateSpeechFn/transcribeFn/generateSpeechStreamFn/transcribeStreamFn now wrap adapter construction with rethrowAudioAdapterError for consistent typed-error responses - realtime image display guards against data:/http(s): double-wrap ai-grok realtime adapter: - teardownConnection drains pendingEvents; sendEvent logs and skips after teardown ai-grok TTS: - sample_rate always forwarded in output_format so body and contentType rate agree * fix(ai-grok): address cr-loop round 4 confirmation findings ai-grok realtime adapter: - teardownConnection on getUserMedia failure (mic/pc/dataChannel leak on mic denial) - response.function_call_arguments.done drops event if call_id absent (no item_id fallback) - isTornDown set at top of teardown to guard handlers firing during close() awaits - setupInputAudioAnalysis/setupOutputAudioAnalysis skip when torn down - onconnectionstatechange no longer double-emits status_change during disconnect() ai-grok audio utils: - toAudioFile Blob/File branch prefers explicit audioFormat over Blob.type ai-grok TTS: - sample_rate forwarded only when caller provides one or codec is pcm (don't override server defaults for container codecs) Tests updated to cover new audioFormat precedence paths and adjusted sample_rate assertions. * fix(ai-grok): address cr-loop round 5 confirmation findings ai-grok realtime adapter: - pc.connectionState=failed triggers automatic teardownConnection (mic/pc/audioContext no longer leak on spontaneous failure) - flushPendingEvents wraps send in try/catch; emits error on failure instead of hanging caller - handleServerEvent case 'error' validates shape of event.error; preserves code/type/param; safe against null/missing fields - autoplay and audioContext.resume failures log without emitting fatal error events (routine browser-policy outcomes) - dataChannel.onerror/onclose gated behind isTornDown to suppress post-disconnect error events examples/ts-react-chat: - realtime.tsx handleImageUpload validates FileReader result, file.type, and base64 extraction; surfaces errors visibly * fix(ai-grok): extensionFor maps mulaw/alaw MIME types to sensible filenames utils/audio.ts produced 'audio.basic' and 'audio.x-alaw-basic' for mulaw/alaw via the default-branch MIME split. Servers using filename as a format hint now see 'audio.mulaw' / 'audio.alaw', matching the reverse toMimeType mapping. * ci: apply automated fixes * refactor(ai-grok): extract form/body builders, adopt ModelMeta convention, fix xAI realtime event names Refactors from user review: adapters: - tts.ts: extract buildTTSRequestBody helper (codec/sample_rate/voice default resolution + body assembly). Export getContentType for consumer use. - transcription.ts: extract buildTranscriptionFormData helper (wire-field mapping including xAI's named 'format' boolean toggle for inverse text normalization). model-meta.ts: audio and realtime models now use the same `as const satisfies ModelMeta` convention as chat/image models (GROK_TTS, GROK_STT, GROK_VOICE_FAST_1, GROK_VOICE_THINK_FAST_1) with input/output modalities and tool_calling / reasoning capabilities. realtime adapter: - Replace drive-by 'as' casts on untyped server events with runtime-checked readers (readString, readObject, readObjectArray); malformed frames return undefined instead of throwing a TypeError. - Accept both legacy OpenAI-realtime event names and current xAI voice-agent names per docs.x.ai: response.output_audio.* / response.output_audio_transcript.* / response.text.* (plus existing response.audio.* / response.audio_transcript.* / response.output_text.* aliases for compatibility). - RealtimeServerError type replaces repeated 'as Error & { code?: string }' casts. realtime token: - Wrap request body with { session: { model } } per xAI /v1/realtime/client_secrets schema (was bare { model } before). * test(ai-grok): cover realtime token body { session: { model } } shape * ci: apply automated fixes * refactor(ai-grok): drop @tanstack/ai-client peer dep by inlining realtime contract The RealtimeAdapter / RealtimeConnection interfaces are duplicated locally in src/realtime/realtime-contract.ts. The adapter imports them from there instead of @tanstack/ai-client, so consumers of @tanstack/ai-grok no longer have to install @tanstack/ai-client unless they also want to construct a RealtimeClient from it (structural typing covers that use case). @tanstack/ai-client stays as a devDependency to run a type-level drift check (tests/realtime-contract.drift.test-d.ts) that asserts our inlined contract is bidirectionally assignable to the canonical one. If ai-client ever changes the interface, that file will fail to compile and we update both in lockstep. publint --strict: clean. * ci: apply automated fixes * fix(ai-grok): address CodeRabbit PR review - tts.ts / transcription.ts: spread `defaultHeaders` BEFORE Authorization / Content-Type so a caller-supplied header can't silently clobber the bearer token or auth content-type. - utils/audio.ts: new `arrayBufferToBase64` helper — Buffer fast path on Node, chunked btoa fallback everywhere else (browser, Workers, Bun). Replaces the Node-only `Buffer.from(arrayBuffer).toString('base64')` in tts.ts. - transcription.ts: new `GrokTranscriptionWord` interface extends the core `TranscriptionWord` with optional `confidence` and `speaker`. The adapter now preserves both fields when xAI returns them, so callers that narrow via `as Array<GrokTranscriptionWord>` get the diarization output they asked for. Test expectations updated. - tts.ts: mulaw/alaw `contentType` now includes a `;rate=…` parameter (as `audio/PCMU` / `audio/PCMA` per RFC 3551) when the caller requests a non-default sample rate, instead of the 8 kHz-implying `audio/basic` / `audio/x-alaw-basic`. - realtime/adapter.ts: `conversation.item.truncated` flips mode back to `listening` so the visualiser can't get stuck on `speaking` after an interrupt. `sendEvent` wraps `dataChannel.send` in try/catch consistent with `flushPendingEvents`. The shared `emptyFrequencyData` / `emptyTimeDomainData` buffers are gone — `getAudioVisualization` returns a fresh `Uint8Array` per call so consumers can't mutate a module-level instance. - realtime/token.ts: adds a 15s `AbortController` timeout on the client_secrets request so a dead endpoint can't hang the caller forever. Validates `client_secret.value` / `expires_at` shape at runtime before dereferencing so a malformed response throws a descriptive error. - realtime/realtime-contract.ts: JSDoc filename ref updated. - examples/ts-react-chat audio/speech/transcribe routes: unify the 400 unknown_provider payload under the `provider` key (was `providerId`) to match the invalid_model_override branch and the request body. * ci: apply automated fixes --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com>
by Tom Beckenham
Failed
main
cb6be753 feat(ai-grok): audio, speech, and realtime adapters + example wiring (#506) * feat(ai-grok): add audio and speech adapters for xAI Add `grokSpeech` (TTS via /v1/tts), `grokTranscription` (STT via /v1/stt), and `grokRealtime` + `grokRealtimeToken` (Voice Agent via /v1/realtime) because xAI's standalone audio APIs were shipped publicly and the adapter previously exposed only text/image/summarize. The TTS/STT endpoints are not OpenAI-compatible so these adapters use direct fetch rather than the OpenAI SDK; the realtime API mirrors OpenAI's shape with URL/provider swaps. E2E coverage is wired via mock.mount('/v1/tts'...) on aimock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge from upstream * feat(ai-grok): wire shared debug logger into audio and realtime adapters Adopt the @tanstack/ai/adapter-internals logger across grokSpeech, grokTranscription, grokRealtimeToken, and grokRealtime so users can toggle debug output the same way they do on other adapters — `debug: true` for full tracing, `debug: false` to silence, or a DebugConfig for per-category control and a custom Logger. Replaces the remaining console.error / console.warn calls in the realtime adapter with logger.errors so nothing is lost when debugging is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): correct super() arg order in audio adapters The transcription and TTS adapters were calling super(config, model), but BaseTranscriptionAdapter/BaseTTSAdapter expect (model, config), causing TS2345 build errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai-grok): pass logger to audio adapter tests After the logger was wired into the audio adapters, the unit tests need to provide one when calling transcribe/generateSpeech directly (activities normally inject it via resolveDebugOption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai-grok): route audio adapter tests through core functions Per project convention, tests should not invoke adapter methods directly — they call generateSpeech()/generateTranscription() with the adapter instance, so the core function injects logger, emits events, and exercises the real public surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): address cr-loop round 1 findings ai-grok realtime adapter: - cleanup pc/localStream/audioContext/dataChannel on connect() failure - dataChannelReady rejects on error/close/ICE-failed/timeout - RTCErrorEvent extracted properly instead of [object Event] - onmessage parse errors emit to consumers - input_audio_transcription no longer overrides caller on every update - response.done preserves idle mode after stopAudioCapture - setupOutputAudioAnalysis disposes prior audioElement, surfaces autoplay blocks - audioContext.resume failures emit error instead of silent swallow - currentMessageId reset on response.created (tool-only turns) - pc.onconnectionstatechange / oniceconnectionstatechange emit status_change - sendImage uses object image_url for OpenAI-realtime compatibility - unknown server events logged via default branch ai-grok TTS/STT: - getContentType returns audio/L16 for pcm (valid IANA MIME) - toAudioFile requires explicit audio_format for bare base64 - transcription option renamed format -> inverse_text_normalization ai-grok realtime token: - expires_at unit-safety guard (seconds vs ms) ai-grok types: - single source of truth for GrokRealtimeModel (model-meta) ai-grok tests: - cover aac/flac in pickCodec test - normalize header assertions via Headers() - add realtime-token unit-safety tests examples/ts-react-chat: - resolveModel fails loud via InvalidModelOverrideError (no silent fallback) - audio/speech/transcribe routes return 400 with structured body testing/e2e: - media-providers uses valid grok-2-image-1212 model - test-matrix imports from feature-support (dedupe) * fix(ai-grok): address cr-loop round 2 confirmation findings ai-grok realtime adapter: - shared teardownConnection() helper runs on SDP and post-SDP failure paths, disposing input/output analysers, audio element, mic, data channel, pc, and audio context - pre-open dataChannelReady rejection on failed/closed/disconnected pc states - pc.onconnectionstatechange is sole source of status_change (ice handler only rejects) - sendImage detects data: prefix (no more double-wrap) ai-grok audio utils: - malformed data: URI MIME parse throws instead of silently defaulting to audio/mpeg - empty/missing base64 payload throws - explicit audioFormat argument wins over URI-embedded MIME ai-grok TTS: - audio/L16 content-type includes required rate= parameter from modelOptions.sample_rate ai-grok tests: - realtime-token afterEach restores original XAI_API_KEY - new coverage for malformed data URIs, audioFormat precedence, and rate= in audio/L16 examples/ts-react-chat: - new UnknownProviderError typed class, 400 mapping in audio/speech/transcribe routes - server-fns ServerFnError wraps typed adapter errors with stable code/details * fix(ai-grok): address cr-loop round 3 confirmation findings examples/ts-react-chat: - generateSpeechFn/transcribeFn/generateSpeechStreamFn/transcribeStreamFn now wrap adapter construction with rethrowAudioAdapterError for consistent typed-error responses - realtime image display guards against data:/http(s): double-wrap ai-grok realtime adapter: - teardownConnection drains pendingEvents; sendEvent logs and skips after teardown ai-grok TTS: - sample_rate always forwarded in output_format so body and contentType rate agree * fix(ai-grok): address cr-loop round 4 confirmation findings ai-grok realtime adapter: - teardownConnection on getUserMedia failure (mic/pc/dataChannel leak on mic denial) - response.function_call_arguments.done drops event if call_id absent (no item_id fallback) - isTornDown set at top of teardown to guard handlers firing during close() awaits - setupInputAudioAnalysis/setupOutputAudioAnalysis skip when torn down - onconnectionstatechange no longer double-emits status_change during disconnect() ai-grok audio utils: - toAudioFile Blob/File branch prefers explicit audioFormat over Blob.type ai-grok TTS: - sample_rate forwarded only when caller provides one or codec is pcm (don't override server defaults for container codecs) Tests updated to cover new audioFormat precedence paths and adjusted sample_rate assertions. * fix(ai-grok): address cr-loop round 5 confirmation findings ai-grok realtime adapter: - pc.connectionState=failed triggers automatic teardownConnection (mic/pc/audioContext no longer leak on spontaneous failure) - flushPendingEvents wraps send in try/catch; emits error on failure instead of hanging caller - handleServerEvent case 'error' validates shape of event.error; preserves code/type/param; safe against null/missing fields - autoplay and audioContext.resume failures log without emitting fatal error events (routine browser-policy outcomes) - dataChannel.onerror/onclose gated behind isTornDown to suppress post-disconnect error events examples/ts-react-chat: - realtime.tsx handleImageUpload validates FileReader result, file.type, and base64 extraction; surfaces errors visibly * fix(ai-grok): extensionFor maps mulaw/alaw MIME types to sensible filenames utils/audio.ts produced 'audio.basic' and 'audio.x-alaw-basic' for mulaw/alaw via the default-branch MIME split. Servers using filename as a format hint now see 'audio.mulaw' / 'audio.alaw', matching the reverse toMimeType mapping. * ci: apply automated fixes * refactor(ai-grok): extract form/body builders, adopt ModelMeta convention, fix xAI realtime event names Refactors from user review: adapters: - tts.ts: extract buildTTSRequestBody helper (codec/sample_rate/voice default resolution + body assembly). Export getContentType for consumer use. - transcription.ts: extract buildTranscriptionFormData helper (wire-field mapping including xAI's named 'format' boolean toggle for inverse text normalization). model-meta.ts: audio and realtime models now use the same `as const satisfies ModelMeta` convention as chat/image models (GROK_TTS, GROK_STT, GROK_VOICE_FAST_1, GROK_VOICE_THINK_FAST_1) with input/output modalities and tool_calling / reasoning capabilities. realtime adapter: - Replace drive-by 'as' casts on untyped server events with runtime-checked readers (readString, readObject, readObjectArray); malformed frames return undefined instead of throwing a TypeError. - Accept both legacy OpenAI-realtime event names and current xAI voice-agent names per docs.x.ai: response.output_audio.* / response.output_audio_transcript.* / response.text.* (plus existing response.audio.* / response.audio_transcript.* / response.output_text.* aliases for compatibility). - RealtimeServerError type replaces repeated 'as Error & { code?: string }' casts. realtime token: - Wrap request body with { session: { model } } per xAI /v1/realtime/client_secrets schema (was bare { model } before). * test(ai-grok): cover realtime token body { session: { model } } shape * ci: apply automated fixes * refactor(ai-grok): drop @tanstack/ai-client peer dep by inlining realtime contract The RealtimeAdapter / RealtimeConnection interfaces are duplicated locally in src/realtime/realtime-contract.ts. The adapter imports them from there instead of @tanstack/ai-client, so consumers of @tanstack/ai-grok no longer have to install @tanstack/ai-client unless they also want to construct a RealtimeClient from it (structural typing covers that use case). @tanstack/ai-client stays as a devDependency to run a type-level drift check (tests/realtime-contract.drift.test-d.ts) that asserts our inlined contract is bidirectionally assignable to the canonical one. If ai-client ever changes the interface, that file will fail to compile and we update both in lockstep. publint --strict: clean. * ci: apply automated fixes * fix(ai-grok): address CodeRabbit PR review - tts.ts / transcription.ts: spread `defaultHeaders` BEFORE Authorization / Content-Type so a caller-supplied header can't silently clobber the bearer token or auth content-type. - utils/audio.ts: new `arrayBufferToBase64` helper — Buffer fast path on Node, chunked btoa fallback everywhere else (browser, Workers, Bun). Replaces the Node-only `Buffer.from(arrayBuffer).toString('base64')` in tts.ts. - transcription.ts: new `GrokTranscriptionWord` interface extends the core `TranscriptionWord` with optional `confidence` and `speaker`. The adapter now preserves both fields when xAI returns them, so callers that narrow via `as Array<GrokTranscriptionWord>` get the diarization output they asked for. Test expectations updated. - tts.ts: mulaw/alaw `contentType` now includes a `;rate=…` parameter (as `audio/PCMU` / `audio/PCMA` per RFC 3551) when the caller requests a non-default sample rate, instead of the 8 kHz-implying `audio/basic` / `audio/x-alaw-basic`. - realtime/adapter.ts: `conversation.item.truncated` flips mode back to `listening` so the visualiser can't get stuck on `speaking` after an interrupt. `sendEvent` wraps `dataChannel.send` in try/catch consistent with `flushPendingEvents`. The shared `emptyFrequencyData` / `emptyTimeDomainData` buffers are gone — `getAudioVisualization` returns a fresh `Uint8Array` per call so consumers can't mutate a module-level instance. - realtime/token.ts: adds a 15s `AbortController` timeout on the client_secrets request so a dead endpoint can't hang the caller forever. Validates `client_secret.value` / `expires_at` shape at runtime before dereferencing so a malformed response throws a descriptive error. - realtime/realtime-contract.ts: JSDoc filename ref updated. - examples/ts-react-chat audio/speech/transcribe routes: unify the 400 unknown_provider payload under the `provider` key (was `providerId`) to match the invalid_model_override branch and the request body. * ci: apply automated fixes --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com>
by Tom Beckenham
Succeeded
main
2e4c9429 feat(ai-grok): audio, speech, and realtime adapters + example wiring (#506) * feat(ai-grok): add audio and speech adapters for xAI Add `grokSpeech` (TTS via /v1/tts), `grokTranscription` (STT via /v1/stt), and `grokRealtime` + `grokRealtimeToken` (Voice Agent via /v1/realtime) because xAI's standalone audio APIs were shipped publicly and the adapter previously exposed only text/image/summarize. The TTS/STT endpoints are not OpenAI-compatible so these adapters use direct fetch rather than the OpenAI SDK; the realtime API mirrors OpenAI's shape with URL/provider swaps. E2E coverage is wired via mock.mount('/v1/tts'...) on aimock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge from upstream * feat(ai-grok): wire shared debug logger into audio and realtime adapters Adopt the @tanstack/ai/adapter-internals logger across grokSpeech, grokTranscription, grokRealtimeToken, and grokRealtime so users can toggle debug output the same way they do on other adapters — `debug: true` for full tracing, `debug: false` to silence, or a DebugConfig for per-category control and a custom Logger. Replaces the remaining console.error / console.warn calls in the realtime adapter with logger.errors so nothing is lost when debugging is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): correct super() arg order in audio adapters The transcription and TTS adapters were calling super(config, model), but BaseTranscriptionAdapter/BaseTTSAdapter expect (model, config), causing TS2345 build errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai-grok): pass logger to audio adapter tests After the logger was wired into the audio adapters, the unit tests need to provide one when calling transcribe/generateSpeech directly (activities normally inject it via resolveDebugOption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai-grok): route audio adapter tests through core functions Per project convention, tests should not invoke adapter methods directly — they call generateSpeech()/generateTranscription() with the adapter instance, so the core function injects logger, emits events, and exercises the real public surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): address cr-loop round 1 findings ai-grok realtime adapter: - cleanup pc/localStream/audioContext/dataChannel on connect() failure - dataChannelReady rejects on error/close/ICE-failed/timeout - RTCErrorEvent extracted properly instead of [object Event] - onmessage parse errors emit to consumers - input_audio_transcription no longer overrides caller on every update - response.done preserves idle mode after stopAudioCapture - setupOutputAudioAnalysis disposes prior audioElement, surfaces autoplay blocks - audioContext.resume failures emit error instead of silent swallow - currentMessageId reset on response.created (tool-only turns) - pc.onconnectionstatechange / oniceconnectionstatechange emit status_change - sendImage uses object image_url for OpenAI-realtime compatibility - unknown server events logged via default branch ai-grok TTS/STT: - getContentType returns audio/L16 for pcm (valid IANA MIME) - toAudioFile requires explicit audio_format for bare base64 - transcription option renamed format -> inverse_text_normalization ai-grok realtime token: - expires_at unit-safety guard (seconds vs ms) ai-grok types: - single source of truth for GrokRealtimeModel (model-meta) ai-grok tests: - cover aac/flac in pickCodec test - normalize header assertions via Headers() - add realtime-token unit-safety tests examples/ts-react-chat: - resolveModel fails loud via InvalidModelOverrideError (no silent fallback) - audio/speech/transcribe routes return 400 with structured body testing/e2e: - media-providers uses valid grok-2-image-1212 model - test-matrix imports from feature-support (dedupe) * fix(ai-grok): address cr-loop round 2 confirmation findings ai-grok realtime adapter: - shared teardownConnection() helper runs on SDP and post-SDP failure paths, disposing input/output analysers, audio element, mic, data channel, pc, and audio context - pre-open dataChannelReady rejection on failed/closed/disconnected pc states - pc.onconnectionstatechange is sole source of status_change (ice handler only rejects) - sendImage detects data: prefix (no more double-wrap) ai-grok audio utils: - malformed data: URI MIME parse throws instead of silently defaulting to audio/mpeg - empty/missing base64 payload throws - explicit audioFormat argument wins over URI-embedded MIME ai-grok TTS: - audio/L16 content-type includes required rate= parameter from modelOptions.sample_rate ai-grok tests: - realtime-token afterEach restores original XAI_API_KEY - new coverage for malformed data URIs, audioFormat precedence, and rate= in audio/L16 examples/ts-react-chat: - new UnknownProviderError typed class, 400 mapping in audio/speech/transcribe routes - server-fns ServerFnError wraps typed adapter errors with stable code/details * fix(ai-grok): address cr-loop round 3 confirmation findings examples/ts-react-chat: - generateSpeechFn/transcribeFn/generateSpeechStreamFn/transcribeStreamFn now wrap adapter construction with rethrowAudioAdapterError for consistent typed-error responses - realtime image display guards against data:/http(s): double-wrap ai-grok realtime adapter: - teardownConnection drains pendingEvents; sendEvent logs and skips after teardown ai-grok TTS: - sample_rate always forwarded in output_format so body and contentType rate agree * fix(ai-grok): address cr-loop round 4 confirmation findings ai-grok realtime adapter: - teardownConnection on getUserMedia failure (mic/pc/dataChannel leak on mic denial) - response.function_call_arguments.done drops event if call_id absent (no item_id fallback) - isTornDown set at top of teardown to guard handlers firing during close() awaits - setupInputAudioAnalysis/setupOutputAudioAnalysis skip when torn down - onconnectionstatechange no longer double-emits status_change during disconnect() ai-grok audio utils: - toAudioFile Blob/File branch prefers explicit audioFormat over Blob.type ai-grok TTS: - sample_rate forwarded only when caller provides one or codec is pcm (don't override server defaults for container codecs) Tests updated to cover new audioFormat precedence paths and adjusted sample_rate assertions. * fix(ai-grok): address cr-loop round 5 confirmation findings ai-grok realtime adapter: - pc.connectionState=failed triggers automatic teardownConnection (mic/pc/audioContext no longer leak on spontaneous failure) - flushPendingEvents wraps send in try/catch; emits error on failure instead of hanging caller - handleServerEvent case 'error' validates shape of event.error; preserves code/type/param; safe against null/missing fields - autoplay and audioContext.resume failures log without emitting fatal error events (routine browser-policy outcomes) - dataChannel.onerror/onclose gated behind isTornDown to suppress post-disconnect error events examples/ts-react-chat: - realtime.tsx handleImageUpload validates FileReader result, file.type, and base64 extraction; surfaces errors visibly * fix(ai-grok): extensionFor maps mulaw/alaw MIME types to sensible filenames utils/audio.ts produced 'audio.basic' and 'audio.x-alaw-basic' for mulaw/alaw via the default-branch MIME split. Servers using filename as a format hint now see 'audio.mulaw' / 'audio.alaw', matching the reverse toMimeType mapping. * ci: apply automated fixes * refactor(ai-grok): extract form/body builders, adopt ModelMeta convention, fix xAI realtime event names Refactors from user review: adapters: - tts.ts: extract buildTTSRequestBody helper (codec/sample_rate/voice default resolution + body assembly). Export getContentType for consumer use. - transcription.ts: extract buildTranscriptionFormData helper (wire-field mapping including xAI's named 'format' boolean toggle for inverse text normalization). model-meta.ts: audio and realtime models now use the same `as const satisfies ModelMeta` convention as chat/image models (GROK_TTS, GROK_STT, GROK_VOICE_FAST_1, GROK_VOICE_THINK_FAST_1) with input/output modalities and tool_calling / reasoning capabilities. realtime adapter: - Replace drive-by 'as' casts on untyped server events with runtime-checked readers (readString, readObject, readObjectArray); malformed frames return undefined instead of throwing a TypeError. - Accept both legacy OpenAI-realtime event names and current xAI voice-agent names per docs.x.ai: response.output_audio.* / response.output_audio_transcript.* / response.text.* (plus existing response.audio.* / response.audio_transcript.* / response.output_text.* aliases for compatibility). - RealtimeServerError type replaces repeated 'as Error & { code?: string }' casts. realtime token: - Wrap request body with { session: { model } } per xAI /v1/realtime/client_secrets schema (was bare { model } before). * test(ai-grok): cover realtime token body { session: { model } } shape * ci: apply automated fixes * refactor(ai-grok): drop @tanstack/ai-client peer dep by inlining realtime contract The RealtimeAdapter / RealtimeConnection interfaces are duplicated locally in src/realtime/realtime-contract.ts. The adapter imports them from there instead of @tanstack/ai-client, so consumers of @tanstack/ai-grok no longer have to install @tanstack/ai-client unless they also want to construct a RealtimeClient from it (structural typing covers that use case). @tanstack/ai-client stays as a devDependency to run a type-level drift check (tests/realtime-contract.drift.test-d.ts) that asserts our inlined contract is bidirectionally assignable to the canonical one. If ai-client ever changes the interface, that file will fail to compile and we update both in lockstep. publint --strict: clean. * ci: apply automated fixes * fix(ai-grok): address CodeRabbit PR review - tts.ts / transcription.ts: spread `defaultHeaders` BEFORE Authorization / Content-Type so a caller-supplied header can't silently clobber the bearer token or auth content-type. - utils/audio.ts: new `arrayBufferToBase64` helper — Buffer fast path on Node, chunked btoa fallback everywhere else (browser, Workers, Bun). Replaces the Node-only `Buffer.from(arrayBuffer).toString('base64')` in tts.ts. - transcription.ts: new `GrokTranscriptionWord` interface extends the core `TranscriptionWord` with optional `confidence` and `speaker`. The adapter now preserves both fields when xAI returns them, so callers that narrow via `as Array<GrokTranscriptionWord>` get the diarization output they asked for. Test expectations updated. - tts.ts: mulaw/alaw `contentType` now includes a `;rate=…` parameter (as `audio/PCMU` / `audio/PCMA` per RFC 3551) when the caller requests a non-default sample rate, instead of the 8 kHz-implying `audio/basic` / `audio/x-alaw-basic`. - realtime/adapter.ts: `conversation.item.truncated` flips mode back to `listening` so the visualiser can't get stuck on `speaking` after an interrupt. `sendEvent` wraps `dataChannel.send` in try/catch consistent with `flushPendingEvents`. The shared `emptyFrequencyData` / `emptyTimeDomainData` buffers are gone — `getAudioVisualization` returns a fresh `Uint8Array` per call so consumers can't mutate a module-level instance. - realtime/token.ts: adds a 15s `AbortController` timeout on the client_secrets request so a dead endpoint can't hang the caller forever. Validates `client_secret.value` / `expires_at` shape at runtime before dereferencing so a malformed response throws a descriptive error. - realtime/realtime-contract.ts: JSDoc filename ref updated. - examples/ts-react-chat audio/speech/transcribe routes: unify the 400 unknown_provider payload under the `provider` key (was `providerId`) to match the invalid_model_override branch and the request body. * ci: apply automated fixes --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com>
by Tom Beckenham
Succeeded
main
2e4c9429 feat(ai-grok): audio, speech, and realtime adapters + example wiring (#506) * feat(ai-grok): add audio and speech adapters for xAI Add `grokSpeech` (TTS via /v1/tts), `grokTranscription` (STT via /v1/stt), and `grokRealtime` + `grokRealtimeToken` (Voice Agent via /v1/realtime) because xAI's standalone audio APIs were shipped publicly and the adapter previously exposed only text/image/summarize. The TTS/STT endpoints are not OpenAI-compatible so these adapters use direct fetch rather than the OpenAI SDK; the realtime API mirrors OpenAI's shape with URL/provider swaps. E2E coverage is wired via mock.mount('/v1/tts'...) on aimock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge from upstream * feat(ai-grok): wire shared debug logger into audio and realtime adapters Adopt the @tanstack/ai/adapter-internals logger across grokSpeech, grokTranscription, grokRealtimeToken, and grokRealtime so users can toggle debug output the same way they do on other adapters — `debug: true` for full tracing, `debug: false` to silence, or a DebugConfig for per-category control and a custom Logger. Replaces the remaining console.error / console.warn calls in the realtime adapter with logger.errors so nothing is lost when debugging is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): correct super() arg order in audio adapters The transcription and TTS adapters were calling super(config, model), but BaseTranscriptionAdapter/BaseTTSAdapter expect (model, config), causing TS2345 build errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai-grok): pass logger to audio adapter tests After the logger was wired into the audio adapters, the unit tests need to provide one when calling transcribe/generateSpeech directly (activities normally inject it via resolveDebugOption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai-grok): route audio adapter tests through core functions Per project convention, tests should not invoke adapter methods directly — they call generateSpeech()/generateTranscription() with the adapter instance, so the core function injects logger, emits events, and exercises the real public surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-grok): address cr-loop round 1 findings ai-grok realtime adapter: - cleanup pc/localStream/audioContext/dataChannel on connect() failure - dataChannelReady rejects on error/close/ICE-failed/timeout - RTCErrorEvent extracted properly instead of [object Event] - onmessage parse errors emit to consumers - input_audio_transcription no longer overrides caller on every update - response.done preserves idle mode after stopAudioCapture - setupOutputAudioAnalysis disposes prior audioElement, surfaces autoplay blocks - audioContext.resume failures emit error instead of silent swallow - currentMessageId reset on response.created (tool-only turns) - pc.onconnectionstatechange / oniceconnectionstatechange emit status_change - sendImage uses object image_url for OpenAI-realtime compatibility - unknown server events logged via default branch ai-grok TTS/STT: - getContentType returns audio/L16 for pcm (valid IANA MIME) - toAudioFile requires explicit audio_format for bare base64 - transcription option renamed format -> inverse_text_normalization ai-grok realtime token: - expires_at unit-safety guard (seconds vs ms) ai-grok types: - single source of truth for GrokRealtimeModel (model-meta) ai-grok tests: - cover aac/flac in pickCodec test - normalize header assertions via Headers() - add realtime-token unit-safety tests examples/ts-react-chat: - resolveModel fails loud via InvalidModelOverrideError (no silent fallback) - audio/speech/transcribe routes return 400 with structured body testing/e2e: - media-providers uses valid grok-2-image-1212 model - test-matrix imports from feature-support (dedupe) * fix(ai-grok): address cr-loop round 2 confirmation findings ai-grok realtime adapter: - shared teardownConnection() helper runs on SDP and post-SDP failure paths, disposing input/output analysers, audio element, mic, data channel, pc, and audio context - pre-open dataChannelReady rejection on failed/closed/disconnected pc states - pc.onconnectionstatechange is sole source of status_change (ice handler only rejects) - sendImage detects data: prefix (no more double-wrap) ai-grok audio utils: - malformed data: URI MIME parse throws instead of silently defaulting to audio/mpeg - empty/missing base64 payload throws - explicit audioFormat argument wins over URI-embedded MIME ai-grok TTS: - audio/L16 content-type includes required rate= parameter from modelOptions.sample_rate ai-grok tests: - realtime-token afterEach restores original XAI_API_KEY - new coverage for malformed data URIs, audioFormat precedence, and rate= in audio/L16 examples/ts-react-chat: - new UnknownProviderError typed class, 400 mapping in audio/speech/transcribe routes - server-fns ServerFnError wraps typed adapter errors with stable code/details * fix(ai-grok): address cr-loop round 3 confirmation findings examples/ts-react-chat: - generateSpeechFn/transcribeFn/generateSpeechStreamFn/transcribeStreamFn now wrap adapter construction with rethrowAudioAdapterError for consistent typed-error responses - realtime image display guards against data:/http(s): double-wrap ai-grok realtime adapter: - teardownConnection drains pendingEvents; sendEvent logs and skips after teardown ai-grok TTS: - sample_rate always forwarded in output_format so body and contentType rate agree * fix(ai-grok): address cr-loop round 4 confirmation findings ai-grok realtime adapter: - teardownConnection on getUserMedia failure (mic/pc/dataChannel leak on mic denial) - response.function_call_arguments.done drops event if call_id absent (no item_id fallback) - isTornDown set at top of teardown to guard handlers firing during close() awaits - setupInputAudioAnalysis/setupOutputAudioAnalysis skip when torn down - onconnectionstatechange no longer double-emits status_change during disconnect() ai-grok audio utils: - toAudioFile Blob/File branch prefers explicit audioFormat over Blob.type ai-grok TTS: - sample_rate forwarded only when caller provides one or codec is pcm (don't override server defaults for container codecs) Tests updated to cover new audioFormat precedence paths and adjusted sample_rate assertions. * fix(ai-grok): address cr-loop round 5 confirmation findings ai-grok realtime adapter: - pc.connectionState=failed triggers automatic teardownConnection (mic/pc/audioContext no longer leak on spontaneous failure) - flushPendingEvents wraps send in try/catch; emits error on failure instead of hanging caller - handleServerEvent case 'error' validates shape of event.error; preserves code/type/param; safe against null/missing fields - autoplay and audioContext.resume failures log without emitting fatal error events (routine browser-policy outcomes) - dataChannel.onerror/onclose gated behind isTornDown to suppress post-disconnect error events examples/ts-react-chat: - realtime.tsx handleImageUpload validates FileReader result, file.type, and base64 extraction; surfaces errors visibly * fix(ai-grok): extensionFor maps mulaw/alaw MIME types to sensible filenames utils/audio.ts produced 'audio.basic' and 'audio.x-alaw-basic' for mulaw/alaw via the default-branch MIME split. Servers using filename as a format hint now see 'audio.mulaw' / 'audio.alaw', matching the reverse toMimeType mapping. * ci: apply automated fixes * refactor(ai-grok): extract form/body builders, adopt ModelMeta convention, fix xAI realtime event names Refactors from user review: adapters: - tts.ts: extract buildTTSRequestBody helper (codec/sample_rate/voice default resolution + body assembly). Export getContentType for consumer use. - transcription.ts: extract buildTranscriptionFormData helper (wire-field mapping including xAI's named 'format' boolean toggle for inverse text normalization). model-meta.ts: audio and realtime models now use the same `as const satisfies ModelMeta` convention as chat/image models (GROK_TTS, GROK_STT, GROK_VOICE_FAST_1, GROK_VOICE_THINK_FAST_1) with input/output modalities and tool_calling / reasoning capabilities. realtime adapter: - Replace drive-by 'as' casts on untyped server events with runtime-checked readers (readString, readObject, readObjectArray); malformed frames return undefined instead of throwing a TypeError. - Accept both legacy OpenAI-realtime event names and current xAI voice-agent names per docs.x.ai: response.output_audio.* / response.output_audio_transcript.* / response.text.* (plus existing response.audio.* / response.audio_transcript.* / response.output_text.* aliases for compatibility). - RealtimeServerError type replaces repeated 'as Error & { code?: string }' casts. realtime token: - Wrap request body with { session: { model } } per xAI /v1/realtime/client_secrets schema (was bare { model } before). * test(ai-grok): cover realtime token body { session: { model } } shape * ci: apply automated fixes * refactor(ai-grok): drop @tanstack/ai-client peer dep by inlining realtime contract The RealtimeAdapter / RealtimeConnection interfaces are duplicated locally in src/realtime/realtime-contract.ts. The adapter imports them from there instead of @tanstack/ai-client, so consumers of @tanstack/ai-grok no longer have to install @tanstack/ai-client unless they also want to construct a RealtimeClient from it (structural typing covers that use case). @tanstack/ai-client stays as a devDependency to run a type-level drift check (tests/realtime-contract.drift.test-d.ts) that asserts our inlined contract is bidirectionally assignable to the canonical one. If ai-client ever changes the interface, that file will fail to compile and we update both in lockstep. publint --strict: clean. * ci: apply automated fixes * fix(ai-grok): address CodeRabbit PR review - tts.ts / transcription.ts: spread `defaultHeaders` BEFORE Authorization / Content-Type so a caller-supplied header can't silently clobber the bearer token or auth content-type. - utils/audio.ts: new `arrayBufferToBase64` helper — Buffer fast path on Node, chunked btoa fallback everywhere else (browser, Workers, Bun). Replaces the Node-only `Buffer.from(arrayBuffer).toString('base64')` in tts.ts. - transcription.ts: new `GrokTranscriptionWord` interface extends the core `TranscriptionWord` with optional `confidence` and `speaker`. The adapter now preserves both fields when xAI returns them, so callers that narrow via `as Array<GrokTranscriptionWord>` get the diarization output they asked for. Test expectations updated. - tts.ts: mulaw/alaw `contentType` now includes a `;rate=…` parameter (as `audio/PCMU` / `audio/PCMA` per RFC 3551) when the caller requests a non-default sample rate, instead of the 8 kHz-implying `audio/basic` / `audio/x-alaw-basic`. - realtime/adapter.ts: `conversation.item.truncated` flips mode back to `listening` so the visualiser can't get stuck on `speaking` after an interrupt. `sendEvent` wraps `dataChannel.send` in try/catch consistent with `flushPendingEvents`. The shared `emptyFrequencyData` / `emptyTimeDomainData` buffers are gone — `getAudioVisualization` returns a fresh `Uint8Array` per call so consumers can't mutate a module-level instance. - realtime/token.ts: adds a 15s `AbortController` timeout on the client_secrets request so a dead endpoint can't hang the caller forever. Validates `client_secret.value` / `expires_at` shape at runtime before dereferencing so a malformed response throws a descriptive error. - realtime/realtime-contract.ts: JSDoc filename ref updated. - examples/ts-react-chat audio/speech/transcribe routes: unify the 400 unknown_provider payload under the `provider` key (was `providerId`) to match the invalid_model_override branch and the request body. * ci: apply automated fixes --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com>
by Tom Beckenham
Succeeded
main
af9eb7bb Audit remediation: tests, isolate hardening, framework fixes, error hygiene (#465) * test(ai-code-mode-skills): add unit test coverage for skill library The package had 13 source files with zero unit tests. Added 116 tests across 9 files covering trust strategies, memory + file storage, skill management tools (including name-validation boundaries for register_skill), bindings, skills-to-tools execution with mocked isolate driver, type generation, the system-prompt renderer, and skill selection with a mocked chat adapter. * feat(ai-isolate-cloudflare): support production deployments and harden tool-name handling The Worker was documented, commented, and configured as if unsafe_eval only worked in wrangler dev. Updated src/worker/index.ts, wrangler.toml, and the README to describe the production path (Cloudflare accounts with the unsafe_eval binding enabled), and pointed users to auth / rate limiting as the real production gate. Also added assertSafeToolName in wrap-code.ts to reject tool names that would break out of the generated function identifier, e.g. "foo'); process.exit(1); (function bar() {". Added tests covering quotes, backticks, whitespace, semicolons, newlines, empty strings, leading digits, and the valid identifier shapes. Added a new escape-attempts.test.ts covering JSON.stringify escaping of adversarial tool-result values and verifying the result lands in a plain object-literal assignment (never a template literal). * refactor(ai-ollama): extract tool-converter with test coverage Tool handling was inlined inside the text adapter with raw type casts. Extracted into src/tools/function-tool.ts + tool-converter.ts matching the structure used by ai-openai, ai-anthropic, ai-grok, and ai-groq. Re-exported as convertFunctionToolToAdapterFormat and convertToolsToProviderFormat from the package index. Added 29 unit tests covering the converter, client utilities (createOllamaClient, getOllamaHostFromEnv, generateId, estimateTokens), and the text adapter's streaming behaviour: RUN/TEXT_MESSAGE/tool-call lifecycle events, id synthesis when Ollama omits a tool-call id, tool forwarding to the SDK in provider format, and structured-output JSON parsing with error wrapping. The package previously had 73 source files and zero unit tests. * fix(frameworks): propagate useChat callback changes after re-render onResponse, onChunk, and onCustomEvent were captured by reference at ChatClient creation time. When a parent component re-rendered with fresh closures, the client kept calling the originals. - ai-react / ai-preact: wrap the three callbacks the same way onFinish/onError already were, reading from optionsRef.current at call time. - ai-vue / ai-solid: wrap the callbacks to read options.xxx at call time. This also fixes a subtler bug where using client.updateOptions to swap callbacks could not clear them (the "!== undefined" guard silently skipped undefined values). - ai-svelte: documented the capture-at-creation behaviour — Svelte's createChat runs once per instance and there's no per-render hook, so callbacks are frozen unless the caller mutates the options object or calls client.updateOptions imperatively. Added a React regression test that rerenders with a new onChunk and verifies the new callback fires while the original does not. * refactor(ai, ai-openai): narrow error handling and stop logging raw errors The three catch blocks that convert thrown values into RUN_ERROR events (stream-to-response.ts, activities/stream-generation-result.ts, activities/generateVideo/index.ts) were using catch(error: any) and dereferencing .message / .code without checks. Added a shared toRunErrorPayload(error, fallback) helper under activities/ that accepts Error instances, plain objects with message/code fields, or bare strings, and funnels all three sites through it with a per-site fallback message. Removed four console.error calls in the OpenAI text adapter's chatStream that dumped the full error object to stdout. SDK errors can carry the original request (including auth headers), so the library no longer logs them; upstream callers should convert errors into structured events. Added 8 unit tests for toRunErrorPayload including a leaked-properties test confirming the helper does not expose extra fields. * test(isolates): add sandbox escape-attempt tests for Node and QuickJS drivers Covers the attack surface a malicious skill / code-mode snippet might probe: process/require/fetch should be unavailable, prototype pollution must not leak to the host or between contexts, synchronous CPU-spin loops must be interrupted by the timeout (not hang), and Function- constructor escape attempts must execute inside the isolate (never returning a real host process object). QuickJS also gets a test that globalThis mutations inside one context do not bleed into a sibling context. * ci: apply automated fixes * fix: address PR review feedback - ai-preact: forward onCustomEvent in useChat (changeset claimed the fix covered preact but it was silently dropped before reaching ChatClient). - ai-isolate-cloudflare: reject JS reserved keywords as tool names (return, class, function, if, await, ...) so the wrapper fails fast at generation time instead of with a cryptic SyntaxError at eval. - ai/src/activities/error-payload: apply typeof string check to the Error branch's code field, matching the plain-object branch. Some SDKs attach numeric or Symbol codes to Error instances. - ai-ollama text-adapter test: strengthen OLLAMA_HOST assertion by tracking the mocked Ollama constructor args, so the test fails if the env var is ignored. - ai-ollama utils test: rename 'when OLLAMA_HOST is unset' to 'empty' since the setup stubs an empty string. - ai-code-mode-skills file-storage test: use vi.useFakeTimers() for the createdAt/updatedAt round-trip instead of a 5ms real sleep. * ci: apply automated fixes * fix(ai, ai-ollama): merge-driven regressions from CR Address CR findings after merging main: - ai-ollama tests: inject testLogger (from resolveDebugOption(false)) into every adapter.chatStream and adapter.structuredOutput call — main's #467 made `logger` required on TextOptions, the PR's new tests were written against the pre-#467 contract and crashed at runtime on `logger.errors`/`logger.request` dereference. - generateVideo: narrow `error` via toRunErrorPayload before handing it to logger.errors. Previously passed the raw error object through the logger meta, which would surface SDK request state (headers, payloads) to any user-supplied logger — defeating the hardening the PR applies to the RUN_ERROR event. - error-narrowing changeset: update wording to match actual code. The OpenAI text adapter's chatStream still logs under the merge, but now through the narrowed `{message, code}` payload rather than raw errors. Changeset previously claimed "the library now re-throws without logging", which didn't match shipped behavior. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Alem Tuzlak
Succeeded
main
af9eb7bb Audit remediation: tests, isolate hardening, framework fixes, error hygiene (#465) * test(ai-code-mode-skills): add unit test coverage for skill library The package had 13 source files with zero unit tests. Added 116 tests across 9 files covering trust strategies, memory + file storage, skill management tools (including name-validation boundaries for register_skill), bindings, skills-to-tools execution with mocked isolate driver, type generation, the system-prompt renderer, and skill selection with a mocked chat adapter. * feat(ai-isolate-cloudflare): support production deployments and harden tool-name handling The Worker was documented, commented, and configured as if unsafe_eval only worked in wrangler dev. Updated src/worker/index.ts, wrangler.toml, and the README to describe the production path (Cloudflare accounts with the unsafe_eval binding enabled), and pointed users to auth / rate limiting as the real production gate. Also added assertSafeToolName in wrap-code.ts to reject tool names that would break out of the generated function identifier, e.g. "foo'); process.exit(1); (function bar() {". Added tests covering quotes, backticks, whitespace, semicolons, newlines, empty strings, leading digits, and the valid identifier shapes. Added a new escape-attempts.test.ts covering JSON.stringify escaping of adversarial tool-result values and verifying the result lands in a plain object-literal assignment (never a template literal). * refactor(ai-ollama): extract tool-converter with test coverage Tool handling was inlined inside the text adapter with raw type casts. Extracted into src/tools/function-tool.ts + tool-converter.ts matching the structure used by ai-openai, ai-anthropic, ai-grok, and ai-groq. Re-exported as convertFunctionToolToAdapterFormat and convertToolsToProviderFormat from the package index. Added 29 unit tests covering the converter, client utilities (createOllamaClient, getOllamaHostFromEnv, generateId, estimateTokens), and the text adapter's streaming behaviour: RUN/TEXT_MESSAGE/tool-call lifecycle events, id synthesis when Ollama omits a tool-call id, tool forwarding to the SDK in provider format, and structured-output JSON parsing with error wrapping. The package previously had 73 source files and zero unit tests. * fix(frameworks): propagate useChat callback changes after re-render onResponse, onChunk, and onCustomEvent were captured by reference at ChatClient creation time. When a parent component re-rendered with fresh closures, the client kept calling the originals. - ai-react / ai-preact: wrap the three callbacks the same way onFinish/onError already were, reading from optionsRef.current at call time. - ai-vue / ai-solid: wrap the callbacks to read options.xxx at call time. This also fixes a subtler bug where using client.updateOptions to swap callbacks could not clear them (the "!== undefined" guard silently skipped undefined values). - ai-svelte: documented the capture-at-creation behaviour — Svelte's createChat runs once per instance and there's no per-render hook, so callbacks are frozen unless the caller mutates the options object or calls client.updateOptions imperatively. Added a React regression test that rerenders with a new onChunk and verifies the new callback fires while the original does not. * refactor(ai, ai-openai): narrow error handling and stop logging raw errors The three catch blocks that convert thrown values into RUN_ERROR events (stream-to-response.ts, activities/stream-generation-result.ts, activities/generateVideo/index.ts) were using catch(error: any) and dereferencing .message / .code without checks. Added a shared toRunErrorPayload(error, fallback) helper under activities/ that accepts Error instances, plain objects with message/code fields, or bare strings, and funnels all three sites through it with a per-site fallback message. Removed four console.error calls in the OpenAI text adapter's chatStream that dumped the full error object to stdout. SDK errors can carry the original request (including auth headers), so the library no longer logs them; upstream callers should convert errors into structured events. Added 8 unit tests for toRunErrorPayload including a leaked-properties test confirming the helper does not expose extra fields. * test(isolates): add sandbox escape-attempt tests for Node and QuickJS drivers Covers the attack surface a malicious skill / code-mode snippet might probe: process/require/fetch should be unavailable, prototype pollution must not leak to the host or between contexts, synchronous CPU-spin loops must be interrupted by the timeout (not hang), and Function- constructor escape attempts must execute inside the isolate (never returning a real host process object). QuickJS also gets a test that globalThis mutations inside one context do not bleed into a sibling context. * ci: apply automated fixes * fix: address PR review feedback - ai-preact: forward onCustomEvent in useChat (changeset claimed the fix covered preact but it was silently dropped before reaching ChatClient). - ai-isolate-cloudflare: reject JS reserved keywords as tool names (return, class, function, if, await, ...) so the wrapper fails fast at generation time instead of with a cryptic SyntaxError at eval. - ai/src/activities/error-payload: apply typeof string check to the Error branch's code field, matching the plain-object branch. Some SDKs attach numeric or Symbol codes to Error instances. - ai-ollama text-adapter test: strengthen OLLAMA_HOST assertion by tracking the mocked Ollama constructor args, so the test fails if the env var is ignored. - ai-ollama utils test: rename 'when OLLAMA_HOST is unset' to 'empty' since the setup stubs an empty string. - ai-code-mode-skills file-storage test: use vi.useFakeTimers() for the createdAt/updatedAt round-trip instead of a 5ms real sleep. * ci: apply automated fixes * fix(ai, ai-ollama): merge-driven regressions from CR Address CR findings after merging main: - ai-ollama tests: inject testLogger (from resolveDebugOption(false)) into every adapter.chatStream and adapter.structuredOutput call — main's #467 made `logger` required on TextOptions, the PR's new tests were written against the pre-#467 contract and crashed at runtime on `logger.errors`/`logger.request` dereference. - generateVideo: narrow `error` via toRunErrorPayload before handing it to logger.errors. Previously passed the raw error object through the logger meta, which would surface SDK request state (headers, payloads) to any user-supplied logger — defeating the hardening the PR applies to the RUN_ERROR event. - error-narrowing changeset: update wording to match actual code. The OpenAI text adapter's chatStream still logs under the merge, but now through the narrowed `{message, code}` payload rather than raw errors. Changeset previously claimed "the library now re-throws without logging", which didn't match shipped behavior. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Alem Tuzlak
Succeeded
main
008f0154 fix(ai-client): prevent drainPostStreamActions re-entrancy race condition (#429) * fix(ai-client): prevent drainPostStreamActions re-entrancy stealing queued actions When multiple client tools complete in the same round, each addToolResult() queues a checkForContinuation action. The first drain executes one action which calls streamResponse(), whose finally block calls drainPostStreamActions() again (nested). The inner drain steals the remaining actions, permanently stalling the conversation. Add a draining flag to skip nested drain calls. The outer drain processes all actions sequentially, preventing action theft. Also fix shouldAutoSend() to require at least one tool call in the last assistant message. Previously it returned true for text-only responses (areAllToolsComplete() returns true when toolParts.length === 0), causing the second queued checkForContinuation action to incorrectly trigger an extra continuation round and produce duplicate content. Fixes #302 * ci: apply automated fixes * changeset: fix drain post-stream re-entrancy * fix: resolve type errors in drain re-entrancy test * test: add e2e regression test for drain re-entrancy stall (#302) Add a Playwright e2e test that verifies parallel client tools complete and the continuation fires with a follow-up text response. Without the drainPostStreamActions() re-entrancy guard, nested drain calls steal queued actions and permanently stall the conversation after both tools complete. The test asserts that the follow-up text "All displayed" arrives, which would time out without the fix. * ci: apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Alem Tuzlak
Succeeded
main
008f0154 fix(ai-client): prevent drainPostStreamActions re-entrancy race condition (#429) * fix(ai-client): prevent drainPostStreamActions re-entrancy stealing queued actions When multiple client tools complete in the same round, each addToolResult() queues a checkForContinuation action. The first drain executes one action which calls streamResponse(), whose finally block calls drainPostStreamActions() again (nested). The inner drain steals the remaining actions, permanently stalling the conversation. Add a draining flag to skip nested drain calls. The outer drain processes all actions sequentially, preventing action theft. Also fix shouldAutoSend() to require at least one tool call in the last assistant message. Previously it returned true for text-only responses (areAllToolsComplete() returns true when toolParts.length === 0), causing the second queued checkForContinuation action to incorrectly trigger an extra continuation round and produce duplicate content. Fixes #302 * ci: apply automated fixes * changeset: fix drain post-stream re-entrancy * fix: resolve type errors in drain re-entrancy test * test: add e2e regression test for drain re-entrancy stall (#302) Add a Playwright e2e test that verifies parallel client tools complete and the continuation fires with a follow-up text response. Without the drainPostStreamActions() re-entrancy guard, nested drain calls steal queued actions and permanently stall the conversation after both tools complete. The test asserts that the follow-up text "All displayed" arrives, which would time out without the fix. * ci: apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Alem Tuzlak
Succeeded
main
2832426a ci: Version Packages (#493) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
by github-act...
Succeeded
main
2832426a ci: Version Packages (#493) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
by github-act...
Succeeded
main
dc71c721 feat(examples): add AI-powered search example (#114) * feat(examples): add AI-powered search example - Introduces a comprehensive React example demonstrating natural language search capabilities - Users can query merchant data (orders, disputes, settlements) using conversational language like "show me orders from last week" - AI converts natural language prompts into structured search parameters with proper filtering and date ranges - Includes full UI with data tables, filters, and responsive design using Tailwind CSS - Leverages TanStack Start, TanStack Router, TanStack AI * feat(examples/ts-react-search): add navigation component to hero section - Added a new Navigation component with links to Home, Orders, Disputes, and Settlements pages - Integrated navigation into the hero section for improved user experience * refactor(navigation): simplify route references in Navigation component - Replace imported route objects with hardcoded string paths - Remove unused route imports * refactor(routes): restructure API search route into directory - Moved api.search.ts to api/search.ts for better organization - Updated route tree imports to reflect new file structure - Maintains existing functionality while improving code organization * feat(examples): integrate TanStack DB for client-side data management - Replace server functions with TanStack DB collections and live queries - Add @tanstack/react-db, @tanstack/query-db-collection, and related packages - Implement disputes, orders, and settlements collections with Zod validation - Create useLiveQuery hooks for reactive data filtering and searching - Update components to use client-side collections instead of server functions * feat(examples): update search API to use server-sent events streaming - Migrated from `toStreamResponse` to `toServerSentEventsResponse` for improved streaming - Updated OpenAI adapter to use `openaiText` with model specification - Updated multiple dependencies including React, TanStack Router, and TailwindCSS * refactor(search): migrate search API from streaming to synchronous with structured validation - Replaced Server-Sent Events with React Query mutation for search requests - Added Zod schema validation for structured output in search API - Updated search component to handle JSON responses instead of streaming - Improved error handling and type safety for search parameters * refactor(search): extract search mutation logic into reusable hook - Moved search API mutation logic from Search component into dedicated useSearchMutation hook - Improves code reusability and separation of concerns - Enables search functionality to be used across multiple components - Reduces code duplication and improves maintainability * build(ts-react-search): bump TanStack Router stack and refresh lockfile - Align the ts-react-search example with newer @tanstack/react-router, react-start, router-plugin, and devtools packages so it stays compatible with current TanStack releases - Regenerated route tree types so the layout route reports fullPath as '/' instead of an empty string, matching the updated router codegen - Updated the workspace lockfile so installs resolve consistently with the new dependency graph * feat(ts-react-search): switch AI search adapter from OpenAI to Groq - Example depends on @tanstack/ai-groq and uses groqText with openai/gpt-oss-20b - Server checks GROQ_API_KEY instead of OPENAI_API_KEY - Output schema is wrapped with toGroqCompatibleSchema so Groq accepts JSON Schema unions (additionalProperties on anyOf) * ci: apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Denis Shvets
Succeeded
main
dc71c721 feat(examples): add AI-powered search example (#114) * feat(examples): add AI-powered search example - Introduces a comprehensive React example demonstrating natural language search capabilities - Users can query merchant data (orders, disputes, settlements) using conversational language like "show me orders from last week" - AI converts natural language prompts into structured search parameters with proper filtering and date ranges - Includes full UI with data tables, filters, and responsive design using Tailwind CSS - Leverages TanStack Start, TanStack Router, TanStack AI * feat(examples/ts-react-search): add navigation component to hero section - Added a new Navigation component with links to Home, Orders, Disputes, and Settlements pages - Integrated navigation into the hero section for improved user experience * refactor(navigation): simplify route references in Navigation component - Replace imported route objects with hardcoded string paths - Remove unused route imports * refactor(routes): restructure API search route into directory - Moved api.search.ts to api/search.ts for better organization - Updated route tree imports to reflect new file structure - Maintains existing functionality while improving code organization * feat(examples): integrate TanStack DB for client-side data management - Replace server functions with TanStack DB collections and live queries - Add @tanstack/react-db, @tanstack/query-db-collection, and related packages - Implement disputes, orders, and settlements collections with Zod validation - Create useLiveQuery hooks for reactive data filtering and searching - Update components to use client-side collections instead of server functions * feat(examples): update search API to use server-sent events streaming - Migrated from `toStreamResponse` to `toServerSentEventsResponse` for improved streaming - Updated OpenAI adapter to use `openaiText` with model specification - Updated multiple dependencies including React, TanStack Router, and TailwindCSS * refactor(search): migrate search API from streaming to synchronous with structured validation - Replaced Server-Sent Events with React Query mutation for search requests - Added Zod schema validation for structured output in search API - Updated search component to handle JSON responses instead of streaming - Improved error handling and type safety for search parameters * refactor(search): extract search mutation logic into reusable hook - Moved search API mutation logic from Search component into dedicated useSearchMutation hook - Improves code reusability and separation of concerns - Enables search functionality to be used across multiple components - Reduces code duplication and improves maintainability * build(ts-react-search): bump TanStack Router stack and refresh lockfile - Align the ts-react-search example with newer @tanstack/react-router, react-start, router-plugin, and devtools packages so it stays compatible with current TanStack releases - Regenerated route tree types so the layout route reports fullPath as '/' instead of an empty string, matching the updated router codegen - Updated the workspace lockfile so installs resolve consistently with the new dependency graph * feat(ts-react-search): switch AI search adapter from OpenAI to Groq - Example depends on @tanstack/ai-groq and uses groqText with openai/gpt-oss-20b - Server checks GROQ_API_KEY instead of OPENAI_API_KEY - Output schema is wrapped with toGroqCompatibleSchema so Groq accepts JSON Schema unions (additionalProperties on anyOf) * ci: apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Denis Shvets
Succeeded
main
dc71c721 feat(examples): add AI-powered search example (#114) * feat(examples): add AI-powered search example - Introduces a comprehensive React example demonstrating natural language search capabilities - Users can query merchant data (orders, disputes, settlements) using conversational language like "show me orders from last week" - AI converts natural language prompts into structured search parameters with proper filtering and date ranges - Includes full UI with data tables, filters, and responsive design using Tailwind CSS - Leverages TanStack Start, TanStack Router, TanStack AI * feat(examples/ts-react-search): add navigation component to hero section - Added a new Navigation component with links to Home, Orders, Disputes, and Settlements pages - Integrated navigation into the hero section for improved user experience * refactor(navigation): simplify route references in Navigation component - Replace imported route objects with hardcoded string paths - Remove unused route imports * refactor(routes): restructure API search route into directory - Moved api.search.ts to api/search.ts for better organization - Updated route tree imports to reflect new file structure - Maintains existing functionality while improving code organization * feat(examples): integrate TanStack DB for client-side data management - Replace server functions with TanStack DB collections and live queries - Add @tanstack/react-db, @tanstack/query-db-collection, and related packages - Implement disputes, orders, and settlements collections with Zod validation - Create useLiveQuery hooks for reactive data filtering and searching - Update components to use client-side collections instead of server functions * feat(examples): update search API to use server-sent events streaming - Migrated from `toStreamResponse` to `toServerSentEventsResponse` for improved streaming - Updated OpenAI adapter to use `openaiText` with model specification - Updated multiple dependencies including React, TanStack Router, and TailwindCSS * refactor(search): migrate search API from streaming to synchronous with structured validation - Replaced Server-Sent Events with React Query mutation for search requests - Added Zod schema validation for structured output in search API - Updated search component to handle JSON responses instead of streaming - Improved error handling and type safety for search parameters * refactor(search): extract search mutation logic into reusable hook - Moved search API mutation logic from Search component into dedicated useSearchMutation hook - Improves code reusability and separation of concerns - Enables search functionality to be used across multiple components - Reduces code duplication and improves maintainability * build(ts-react-search): bump TanStack Router stack and refresh lockfile - Align the ts-react-search example with newer @tanstack/react-router, react-start, router-plugin, and devtools packages so it stays compatible with current TanStack releases - Regenerated route tree types so the layout route reports fullPath as '/' instead of an empty string, matching the updated router codegen - Updated the workspace lockfile so installs resolve consistently with the new dependency graph * feat(ts-react-search): switch AI search adapter from OpenAI to Groq - Example depends on @tanstack/ai-groq and uses groqText with openai/gpt-oss-20b - Server checks GROQ_API_KEY instead of OPENAI_API_KEY - Output schema is wrapped with toGroqCompatibleSchema so Groq accepts JSON Schema unions (additionalProperties on anyOf) * ci: apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Denis Shvets
Succeeded
main
54523f5e feat: audio media support — fal audio/speech/STT adapters, Gemini Lyria + 3.1 Flash TTS, streaming generateAudio + hooks (#463) * feat: add fal audio, speech, and transcription adapters Adds falSpeech, falTranscription, and falAudio adapters to @tanstack/ai-fal, completing fal's media coverage alongside image and video. Introduces a new generateAudio activity in @tanstack/ai for music and sound-effect generation, with matching devtools events and types. Closes #328 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: add ElevenLabs TTS/music/SFX/transcription adapters and Gemini Lyria + 3.1 Flash TTS Extends @tanstack/ai-elevenlabs (which already covers realtime voice) with Speech, Music, Sound Effects, and Transcription adapters, each tree-shakeable under its own import. Adds Gemini Lyria 3 Pro / Clip music generation via a new generateAudio adapter, plus the new Gemini 3.1 Flash TTS Preview model with multi-speaker dialogue support. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: document fal audio, speech, and transcription adapters Adds a new Audio Generation page, expands the fal adapter reference with sections for text-to-speech, transcription, and audio/music, and adds fal sections to the Text-to-Speech and Transcription guides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: add example pages and tests for audio/tts providers Expand the ts-react-chat example with provider tabs for OpenAI, ElevenLabs, Gemini, and Fal on the TTS and transcription pages, plus a new /generations/audio page covering ElevenLabs Music, ElevenLabs SFX, Gemini Lyria, and Fal audio generation. Add a Gemini TTS unit test and wire an audio-gen feature into the E2E harness (adapter factory, API route, UI, fixture, and Playwright spec). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * docs: lead audio generation guide with Gemini and ElevenLabs Reorder the Audio Generation page so the direct Gemini (Lyria) and ElevenLabs (music/sfx) adapters appear before fal.ai, and update the environment variables + result-shape notes to cover all three providers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ts-react-chat): add audio home tile, sample prompts, and fal model selector Expose an Audio tile on the welcome grid, offer one-click sample prompts for every audio provider, and let the Fal provider pick between current text-to-music models (default MiniMax v2.6). Threads a model override through the audio API and server fn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * chore: split ElevenLabs audio adapters out to separate PR (#485) Moves the new ElevenLabs TTS / Music / SFX / Transcription REST adapters out of this PR into their own issue (#485) and branch (`elevenlabs-audio-adapters`) so the fal + Gemini audio work can ship independently. The follow-up PR will rebuild these adapters on top of the official `@elevenlabs/elevenlabs-js` SDK rather than hand-rolled fetch calls. Removed from this branch: - `packages/typescript/ai-elevenlabs/src/{adapters,utils,model-meta.ts}` and their tests (realtime voice code untouched) - ElevenLabs sections in `docs/media/audio-generation.md` - ElevenLabs entries in `examples/ts-react-chat` audio-providers catalog, server adapter factories, zod schemas, and default provider wiring - `@tanstack/ai-elevenlabs` bump from the audio changeset Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * fix(ai-fal, ai-gemini): audio adapter bug fixes - ai-fal: replace `btoa(String.fromCharCode(...bytes))` with a chunked helper; the spread form throws RangeError on any realistic TTS clip (V8 arg limit ~65k). - ai-gemini: honor `TTSOptions.voice` as a fallback for the prebuilt voice name, move `systemInstruction` inside `config` per the @google/genai contract, and wrap raw `audio/L16;codec=pcm` output in a RIFF/WAV container so the result is actually playable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ts-react-chat): warn on rejected audio model overrides Log a warning instead of silently swapping to the default when a client sends a model id outside the provider's allowlist, so stale clients or typo'd config ids are debuggable. Also correct the AudioProviderConfig JSDoc to describe the models[] ordering as a non-binding UI convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: split generateAudio into generateMusic and generateSoundEffects Replaces the unreleased generateAudio activity with two distinct activities so music and sound-effects each have their own types, adapter kinds, provider factories, and devtools events. This lets providers advertise only the capabilities they support (Gemini Lyria is music-only; fal has distinct music and SFX catalogs) and leaves room for kind-specific options without a breaking change. - Core: generateMusic/generateSoundEffects activities and MusicAdapter/ SoundEffectsAdapter interfaces + bases; GeneratedAudio shared between MusicGenerationResult and SoundEffectsGenerationResult - Events: music:request:* and soundEffects:request:* replace audio:* - fal: falMusic + falSoundEffects factories sharing internal request/response helpers; FalMusic/FalSoundEffectsProviderOptions in model-meta - Gemini: geminiMusic/createGeminiMusic/GeminiMusicAdapter (Lyria is music-only so no SFX counterpart) - ts-react-chat: /generations/music and /generations/sound-effects routes backed by a shared AudioGenerationForm; split server fns and API routes - E2E: music-gen + sound-effects-gen features, parameterized MediaAudioGenUI, split fixtures and specs (both feature support sets are empty since aimock 1.14 cannot mock Gemini's Lyria AUDIO modality) - Docs: music-generation.md + sound-effects-generation.md; fal adapter docs split; changesets rewritten in place Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fixed type issue * Delete terminal output * revert: restore single generateAudio activity Supersedes 1010e9b7. The split into generateMusic + generateSoundEffects doesn't hold up against fal's audio catalog: dozens of models span audio-to-audio, voice-change/clone, enhancement, separation, isolation, merge, and understanding, and individual models (e.g. stable-audio-25) generate music AND sound effects. A single broader generateAudio activity fits that reality. Keeps the aimock Gemini-Lyria gap: audio-gen feature-support stays empty because aimock 1.14 has no AUDIO-modality mock for generateContent — the E2E is green by skipping rather than by hitting a mock that doesn't exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: enforce exactly one of url or b64Json on GeneratedImage and GeneratedAudio Model GeneratedImage and GeneratedAudio on a shared mutually-exclusive GeneratedMediaSource union so the type rejects empty objects and objects that set both fields. Update the openai, gemini, grok, openrouter, and fal image adapters to construct results by branching on which field is present; openrouter and fal no longer synthesize a data URI on url when returning base64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * chore(e2e): drop audio-gen scaffolding pending aimock support The audio-gen feature set was empty because aimock cannot currently mock audio generation, so the Playwright spec ran against zero providers. Remove the dead scaffolding; the wiring can return once aimock audio support lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: add useGenerateAudio hook and streaming support for generateAudio Closes the parity gap with the other media activities — audio generation now has the same client-hook UX (connection + fetcher transports) as image, speech, video, transcription, and summarize. Adds streaming to generateAudio so it can ride the SSE transport, a matching AudioGenerateInput type in ai-client, framework hooks in ai-react / ai-solid / ai-vue / ai-svelte, unit tests, an updated ts-react-chat example, and docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai-fal): translate duration per audio model Fal audio models use different input field names for length: ElevenLabs Music takes `music_length_ms` in milliseconds, Stable Audio 2.5 takes `seconds_total`, and most others accept `duration`. The adapter was passing a generic `duration` unconditionally, so the slider in the example was silently ignored for ElevenLabs and Stable Audio. Also: align the Gemini Lyria adapter with the API's MP3 default (only send responseMimeType when the caller asks for WAV), expand the example to include Lyria 3 Pro and a dedicated Fal SFX provider, and rename the example's "Direct" mode to "Hooks" to better reflect what it demos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ai-gemini): rename GEMINI_LYRIA_MODELS to GEMINI_AUDIO_MODELS Align the audio model constant and its re-export with the `generateAudio` activity naming used across providers, and drop the unused duplicate `GeminiLyriaModel` type — `GeminiAudioModel` is the single canonical type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ai-gemini): address CR findings — constructor config, TTS model name, PCM channels, voice validation, image error surfacing * fix(ai-fal): address CR findings — generateId entropy, fetch.ok guards, response-shape validation, size params, proxy+apiKey, content types * fix(ai-fal): throw on unknown image response shape instead of returning empty * fix(ai-image-adapters): fix double-wrapped errors, duplicate keys, signature mismatch, null guards * fix(ai-gemini): address CR findings — test import, image model output meta, option filtering * fix(example-ts-react-chat): blob URL revocation, route link, body validation, falsy duration render * fix(ai-core): emit adapter-error events, consistent async, reordered base adapter ctor, type sync * fix(ai-openrouter): drop redundant null guards that TS types already enforce The defensive nullish-coalescing on response.choices and img/img.imageUrl guards that the fix-loop added are impossible per the SDK type signatures; eslint's no-unnecessary-condition correctly rejects them. Keep only the typeof url !== 'string' check, which is a real runtime shape guard (imageUrl.url is typed as string but provider may send a non-string in rare degraded responses). * fix: address CodeRabbit review feedback — SSE types, mime normalization, voice validation, etc. Applies the reviewer-flagged changes that weren't load-bearing for the merge: - event-client: AudioRequestCompletedEvent.audio is now a mutually-exclusive {url; never b64Json} | {b64Json; never url} union so consumers can't read both fields simultaneously, mirroring the GeneratedAudio contract in core. - fal utils: extractUrlExtension now strips URL fragments and trailing slashes, parses via the URL API so a TLD like `.com` isn't mistaken for an extension, and only inspects the final path segment. - fal utils: deriveAudioContentType returns `audio/aac` for aac, separated from the `m4a`/`mp4` → `audio/mp4` case. - fal speech: prefer URL-derived extension when deriving `format`, and normalize `mpeg` → `mp3` so the field is a usable file extension. - gemini audio: drop `negativePrompt` (not accepted by GenerateContentConfig) and `responseMimeType` (Lyria Clip rejects it, Pro returns MP3 by default) from the public provider options surface, and document that the generic `duration` option is ignored by Lyria (Clip is fixed at 30s, Pro takes duration via the natural-language prompt). - gemini tts: multiSpeakerVoiceConfig.speakerVoiceConfigs length is now validated (1 or 2 speakers), partial user-supplied voiceConfig correctly falls back to the standard voice/'Kore' default, parsePcmMimeType tightens detection to exclude subtypes containing "wav" so containerized `audio/wav;codec=pcm` is no longer re-wrapped, and createGeminiSpeech / createGeminiAudio factory functions now spread config before the explicit apiKey argument so caller config can't silently override the API key. - ts-react-chat API routes: replace zod 4's removed `.flatten()` with `z.treeifyError()` for validation error details. - ts-react-chat audio route: `toAudioOutput` returns `null` per the `onResult` hook contract instead of throwing synchronously — failures are still surfaced via the hook's error state. - Updates the tests affected by the above behavior changes. * docs: document debug logging for new audio/speech/transcription activities - debug-logging.md: list generateAudio/generateTranscription in Non-chat activities section; clarify that the `provider` category now applies to streaming generateAudio/generateSpeech/generateTranscription calls too. - audio-generation.md, text-to-speech.md, transcription.md: add a single contextual callout at the moment a builder is most likely to need it (immediately before the Options table / next to Error Handling), pointing to the debug-logging guide. * docs(skill): add audio/speech CR gotchas + debug-logging to media-generation skill Agents hitting the new generateAudio/generateSpeech/generateTranscription activities will run into: - Gemini Lyria doesn't accept responseMimeType or negativePrompt via GenerateContentConfig — shape the prompt instead. - Lyria 3 Clip is fixed 30s; Lyria 3 Pro reads duration from natural-language in the prompt, not the duration option. fal audio maps duration per-model. - Gemini TTS multiSpeakerVoiceConfig is validated to 1 or 2 speakers. - debug: DebugOption is threaded through every generate*() activity — reach for it instead of writing logging middleware. Adds four Common Mistake entries, sources the debug-logging doc, and cross-references the ai-core/debug-logging sub-skill. * fix(ai-fal): decode data URL audio inputs to Blob for transcription fal-client auto-uploads Blob/File inputs via fal.storage.upload but passes strings through unchanged, so data URLs reached fal's API and got rejected with 422 "Unsupported data URL". Decode data URL strings to a Blob in buildInput so the auto-upload path handles them; plain http(s) URLs still pass through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: regenerate API documentation (#494) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
by Tom Beckenham