Overview - Ai

ai

7 days30 days

Latest CI Pipeline Executions

Fuzzy

Succeeded
feat/activity-observers
3a739e70 ci: apply automated fixes
by autofix-ci...
Succeeded
feat/activity-observers
3a739e70 Merge c5bfb044d0036a6380078555fa3564fe3e8a4d9f into 1047e40856e09f1df070223068210eaf1157d33c
by Season Saw
Succeeded
feat/activity-observers
3a347c3b feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video) (#624) * feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation (closes #618) Adds optional `imageInputs`, `videoInputs`, and `audioInputs` to `generateImage()` and `generateVideo()` for image-to-image, multi-reference, mask / inpaint, image-to-video, and starting-frame flows. Each input part may carry a `metadata.role` hint (`'reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character'`) that adapters use to route to the provider-specific field. Provider behavior: - OpenAI image: gpt-image-1 / -mini route to `images.edit()` (up to 16 + mask); dall-e-2 routes to `images.edit()` with one source; dall-e-3 throws. - OpenAI video: Sora-2 / -pro accept a single `input_reference`; throws on >1. - Gemini: native models receive inputs as multimodal `contents` parts; Imagen throws (text-only). - fal: 1 input → `image_url`, >1 → `image_urls`; metadata roles map to `mask_url` / `control_image_url` / `reference_image_urls`; video adds `start_image_url` / `end_image_url`. Interim mapping until the fal schemas library lands. - Grok, OpenRouter: throw with a link back to #618 (pending native Imagine API rewrite and multimodal injection work respectively). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: apply automated fixes * feat(ai-fal): resolve image-input fields per endpoint from generated SDK type map Replace the fal image-input field heuristic with a per-endpoint mapping generated from @fal-ai/client's EndpointTypeMap (scripts/ generate-fal-image-field-map.ts, run via pnpm generate:fal-image-fields). The committed artifact stores only the 362 endpoints whose field names deviate from the defaults (e.g. nano-banana edit -> image_urls, Kling i2v start frame -> image_url, Veo first-last-frame -> first_frame_url / last_frame_url, Fooocus masks -> mask_image_url); the old heuristic remains the fallback for endpoints newer than the installed SDK. Safety rails: the generated file `satisfies`-checks every field name against the SDK endpoint types (type-only, erased at runtime), and a unit test hashes the installed endpoints.d.ts against the recorded hash so an SDK bump without regeneration fails test:lib with the regen command. Mappers are now typed: both return FalImageInputFields<TModel>, Pick'ed from the endpoint's real input type via a generated field-name union. Roles resolving to the same list field merge (source + reference on nano-banana); colliding scalar fields throw instead of overwriting. Also fixes the remaining CI lint failures: duplicate @tanstack/ai import and non-null assertion in ai-fal video.ts, switch-exhaustiveness errors in image-inputs.ts (restructured away), and the non-null assertion in ai-openai image.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai-grok,ai-openrouter): support imageInputs for image-conditioned generation Grok: add the xAI Imagine API image models (grok-imagine-image, grok-imagine-image-quality) to model-meta. With imageInputs they route to xAI's JSON POST /v1/images/edits endpoint via direct fetch (the OpenAI SDK's images.edit() sends multipart/form-data, which xAI rejects) — a single input as image:{url}, 2-3 inputs as images:[...] referenceable in the prompt as <IMAGE_0>/<IMAGE_1>; >3 inputs and mask/control roles throw. Their generic `size` uses an aspectRatio_resolution template ('16:9_2k', suffix optional), mirroring Gemini's native image models, and maps to the Imagine aspect_ratio/resolution parameters on both the generate and edit paths. grok-2-image-1212 stays text-to-image only with a clear error. OpenRouter: imageInputs are injected as multimodal image_url content parts alongside the prompt in the chat-completions message and forwarded to the underlying image model. Neither path fetches or base64-encodes URL sources in-process — URLs pass through verbatim and are fetched by the provider; data sources become data URIs. Bumps ai-grok and ai-openrouter to minor in the existing changeset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: adapt #618 branch to the packages/ restructure and post-rebase API drift - Move the generated fal image-field map and the generator's paths from packages/typescript/ai-fal to packages/ai-fal (repo flattened the layout) - Add gpt-image-2 to EDIT_MAX_IMAGES (new model on main; same 16-image edit limit as the other gpt-image models) - Map edit-path usage through buildImagesUsage to match the new TokenUsage shape, and drop two now-unnecessary type assertions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai): make prompt multimodal for generateImage/generateVideo, pass text through verbatim Replace the imageInputs / videoInputs / audioInputs fields with a multimodal prompt: string | MediaPromptPart[]. Part order is meaningful — natively multimodal providers (Gemini, OpenRouter) receive parts in interleaved order; named-field providers (OpenAI, fal, xAI) extract media parts via the new resolveMediaPrompt() utility and flatten the text. Zero magic: prompt text is always sent verbatim. The SDK never injects or rewrites in-prompt referencing markers — users write each provider's own convention (fal Kling/Seedance @Image1, OpenAI/FLUX.2 "image 1" prose, Gemini content descriptions), now documented per provider in the media docs. An earlier grok <IMAGE_n> auto-injection was removed after research showed the convention is absent from xAI's official docs (images are addressed by request order). - Per-model compile-time prompt narrowing via TModelInputModalitiesByName adapter generic (e.g. dall-e-3 / Imagen reject image parts as a type error); fal modality maps are derived at the type level from the SDK's endpoint input types - metadata.tag added as an informational label (never read by adapters) - Gemini now preserves true interleaving in contents; OpenRouter maps parts 1:1 onto chat content parts in order Closes #618 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: address PR review findings for image/video input support - openai: add gpt-image-2 to the editImages error message and JSDoc (the model is edit-capable via EDIT_MAX_IMAGES but was omitted from user-facing guidance); same fix in docs, SKILL.md, and the changeset - openai: throw when the images.edit() response contains no usable images (matching grok's guard) instead of resolving to { images: [] } - openai: drop the unnecessary input_reference cast in the Sora adapter — the SDK types the field, so assign directly - fal: reject metadata.role 'mask'/'control' in the video mapper instead of silently folding them into source frames - docs: mark Veo role mappings as planned (no Veo adapter yet), note the Gemini ~14-image limit is provider-side, bump samples to gpt-image-2 - tests: cover the Gemini image-conditioned path (interleaved contents, fileData vs inlineData vs fetch+inline, Imagen/video/audio rejection), the Sora input_reference upload and guards (new file), the fal video createVideoJob field assembly and audio guard, and the openai empty-edit-response guard Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ai-openai): throw on empty generateImages responses too Same defect class as the editImages guard in the previous commit: the text-to-image path silently resolved to { images: [] } when response items had neither b64_json nor url. Surface it as an error instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat: client-side multimodal prompts, e2e coverage, media example, fal field demotion - ai-client: widen ImageGenerateInput.prompt / VideoGenerateInput.prompt from string to MediaPrompt so useGenerateImage/useGenerateVideo can carry image parts from the browser; re-export the MediaPrompt types from @tanstack/ai/client - ai-fal: demote media-conditioning fields (FalImageFieldName set plus video_url/video_urls/reference_video_urls/audio_url) from required to optional in FalImageProviderOptions / FalVideoProviderOptions — i2v endpoints declare e.g. image_url as required, but with a multimodal prompt the start frame arrives as a prompt part; modelOptions stays available as the explicit escape hatch - e2e: real coverage for image-to-image (OpenAI /v1/images/edits) and image-to-video (Sora multipart /v1/videos with input_reference) — the installed aimock 1.29 mocks both multipart endpoints, so the previous "aimock can't mock this" empty provider sets were stale. New specs run all three transports and assert via aimock's request journal that the expected wire endpoint was hit. ImageGenUI/VideoGenUI gain a file input, feature routing/fixtures/onVideo registration added, README matrix updated - examples/ts-react-media: ImageGenerator gains a multi-image reference picker (Gemini native models); VideoGenerator sends the start frame as a prompt part with role 'start_frame' instead of modelOptions URLs; server functions narrow the wire prompt per model and throw on unsupported part kinds instead of dropping them Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: address CodeRabbit review findings - fal image/video: spread modelOptions after derived media fields so explicit user overrides win (matches documented intent) - openai video: validate effective size (size ?? modelOptions.size) - generate-fal-image-field-map: run arity check for default-selected fields too - ts-react-media example: correct reference-image support comment (Gemini multimodal models, not NanoBanana) - e2e VideoGenUI: reject on malformed data URL instead of resolving '' Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ai,ai-gemini): add Google Veo video adapter on the typed-duration contract (#634) Restacked on 618-image-to-image-and-image-to-video-support to adopt the multimodal MediaPrompt format, carrying a minimal additive port of the #534 typed-duration contract: - @tanstack/ai (non-breaking): VideoAdapter/BaseVideoAdapter gain a TModelDurationByName generic (default Record<string, number> preserves existing duration?: number typing), DurationOptions, snapToDurationOption, and default availableDurations()/snapDuration() implementations. generateVideo's duration is typed via VideoDurationForAdapter. - @tanstack/ai-gemini: GeminiVideoAdapter over generateVideos / getVideosOperation with per-model typed durations (Veo 3.x 4|6|8, Veo 2 5|6|8 per current Veo docs), MediaPrompt image routing (start_frame → image, end_frame → lastFrame, reference/character → referenceImages), RAI filter surfacing, geminiVideo/createGeminiVideo factories, and finalized Veo model-meta entries. - E2E: gemini added to video-gen with a custom aimock mount for :predictLongRunning + operations polling; all transports pass. - Docs + media-generation skill updated for Veo (typed durations, image-to-video role table). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
by Tom Beckenham
Succeeded
feat/activity-observers
3a347c3b Merge ff221d1d1f663b7c9565a63b55cacc11c40cfed1 into 8fa6cc56c5f36e22885c98a511dcceb2bfc0da1f
by Season Saw
Succeeded
feat/activity-observers
f3525f47 fix(ai): fire video observer finish before terminal stream chunks Move notifyObserverFinish + settled=true ahead of the generation:result yield in streaming generateVideo. Previously they ran after the yield, so a consumer that stopped reading once it had the result (without pulling RUN_FINISHED) tripped the finally cleanup with settled still false — reporting a spurious cancellation onError and never firing onFinish. Add a regression test for that abandonment case. Also use the latest gpt-image-2 model id in the otel docs and otelObserver JSDoc examples, and clear lint (import order, redundant casts) in the otel-media e2e route.
by Season
Succeeded
feat/activity-observers
f3525f47 Merge f2f0e61635cbd07e8d72a1c8787b24a5c43c2bd3 into 984ac3c8a59e4aef6d3e80b89b2d7986af818850
by Season Saw