storybookjs
Enterprise
storybook
Sign in / Sign up
Open main menu
Your Enterprise license has expired.
storybook
GitHub
Overview
Runs
Analytics
Loading workspace stats
Loading workspace insights...
Statistics interval
7 days
30 days
Latest CI Pipeline Executions
Status
Fix filter
Filter
Fuzzy
Filter range
Sort by
Sort by
Start time
Sort ascending
Sort descending
Succeeded
34595
01068aa6 Merge remote-tracking branch 'origin/kasper/eval-prompts-from-cli' into cursor/eval-css-loaded-prompt-check
by Kasper Peulen
K
Canceled
34595
d9db8a81 Eval: Type-annotate env literal in sync-storybook-version `delete env.CI` failed TypeScript check because spreading `process.env` into a literal with typed string fields narrowed `env` to only those literal keys, losing the `[key: string]: string | undefined` index signature. Annotating as `NodeJS.ProcessEnv` preserves the index signature so `delete env.CI` type-checks. Fixes the only non-skipped CI failure on this PR (scripts typecheck).
by Kasper Peulen
K
Succeeded
34595
86ae3337 Merge remote-tracking branch 'origin/kasper/eval-prompts-from-cli' into cursor/eval-css-loaded-prompt-check
by Kasper Peulen
K
Failed
34595
5a85358d Merge remote-tracking branch 'origin/kasper/eval-prompts-from-cli' into cursor/eval-css-loaded-prompt-check # Conflicts: # scripts/eval/prompts/pattern-copy-play.md
by Kasper Peulen
K
Succeeded
34595
fd48b038 Eval: name the CSS-check story `CssCheck` so telemetry can find it Aligns the prompt + grade check with the Slack agreement: instead of hoping the agent adds *some* `getComputedStyle` call somewhere, the prompt now asks for one story explicitly named `CssCheck`. That specific story name is what the AI-stories vitest run in core will grep for to attribute the pass/fail result in the `ai-setup-final-scoring` telemetry event. - `pattern-copy-play.md` Step 7: heading + example updated to `export const CssCheck: Story = { ... }`. - `grade.ts`: `hasComputedStyleAssertion` -> `hasCssCheckStory`, token matched in the diff changed from `getComputedStyle` to `export const CssCheck`. - `grade.test.ts`: added two tests locking in the new use case (positive: story-file diff with the export; negative: prompt.md false positive). - Trial / publish / result-docs test mocks renamed to match. Rationale (from Slack): giving the story a known name means telemetry in core can report on the CSS check result directly, without layering on a separate tag. The story also ends up being educational — a visible example of how to verify CSS loaded. No tag, no new telemetry field required on top of whatever core adds in a follow-up PR.
by Kasper Peulen
K
Succeeded
34595
88dffdb4 Eval grade: scope CSS-assertion check to added lines in story files Before: hasComputedStyleAssertion was a plain rawDiff.includes('getComputedStyle'), which matched the prompt markdown (written to .storybook/eval-results/prompt.md before grade runs) and the transcript JSON — both of which contain the token verbatim because the new prompt Step 7 and the agent's own tool-output lines include it. The flag was effectively tautological: true whenever the prompt was staged, regardless of what the agent did. After: parse the unified patch, track which file each hunk belongs to via the '+++ b/<path>' headers, and only consider added lines (skipping the '+++' header itself) that live in files also present in storybookChanges. Uses the existing STORY_FILE_PATTERN from story-render.ts as the single source of truth for what counts as a story file. Exports diffAddsTokenInStoryFiles as a pure helper with unit tests covering the false-positive paths (prompt.md / data.json), deleted lines, the +++ header, and files not in storybookChanges.
by Kasper Peulen
K
Succeeded
34595
720d018e Eval prompt: reword end-state sentence to avoid biasing toward render() 'Render call' could read as 'you need a render: () => ... function', which is wrong — args stories have no render call and that's the preferred shape for prop-driven components. Softening to 'just rendering the component in the story is enough' keeps the intent without steering toward render().
by Kasper Peulen
K
Canceled
34595
45a09496 Prompts: reword end-state sentence to avoid biasing toward render() 'Render call' could read as 'you need a render: () => ... function', which is wrong — args stories have no render call and that's exactly the preferred shape for prop-driven components. Softening to 'just rendering the component in the story is enough' keeps the intent (shared preview does the heavy lifting) without steering the agent away from args.
by Kasper Peulen
K
Canceled
34595
acb6138f CLI: sync end-state goal + args-vs-render guidance from eval prompt The eval `pattern-copy-play` prompt has two pieces of guidance that were not in the CLI's shipped `storybook setup` prompt: - An end-state paragraph clarifying the goal: every component, from a button to a full page, should be addable without story-specific workarounds. - Args-vs-render authoring guidance: prefer `args` for prop-driven stories, only reach for `render` when composition is needed. Syncing both into the CLI prompt. The new Args/Render subsection uses two helpers (`getArgsStoryExample`, `getRenderCompositionExample`) matching the existing CSF Factory / CSF3 branch style.
by Kasper Peulen
K
Succeeded
34595
dbb4a545 CLI: mirror the CSS-loaded prompt step in ai/prompt.ts The CLI `storybook setup` prompt (shipped to end users) carried a near-identical copy of the eval `pattern-copy-play` prompt. Missed it in the previous commit — syncing the same Step 7 now.
by Kasper Peulen
K
Canceled
34595
daf25b59 Eval: require a getComputedStyle assertion in stories to prove CSS is loaded Addresses #34594. Adds a prompt-level instruction and a grading flag that together catch "renders fine, but user CSS never loaded" failures. - pattern-copy-play.md: new Step 7 requires exactly one story to assert a component-specific computed style via getComputedStyle. - grade.ts: records hasComputedStyleAssertion based on whether the staged diff contains "getComputedStyle" (reuses the existing cached diff, no extra file reads). Chose the prompt+diff approach over a runtime stylesheet heuristic (filtering document.styleSheets, isolated all:initial probe, etc.) because: - The agent already knows what "styled correctly" means for a given component; a component-specific computed-style assertion catches the real failure ("bg-blue-600 did not apply") rather than a generic "something was applied" signal. - No fragile filtering of vitest-browser / storybook / addon stylesheet sources. Addons keep shipping new sheets; that filter would bit-rot. - Failures surface as normal Vitest assertion failures and already flow through pass/fail grading — no new counter, no new warning channel, no changes to render-analysis. - Complementary to a future runtime heuristic if we want one: prompt-level catches "agent misconfigured the design system"; runtime catches "agent shipped a visibly unstyled story without the check".
by Kasper Peulen
K
Succeeded
34595
daf25b59 Eval: require a getComputedStyle assertion in stories to prove CSS is loaded Addresses #34594. Adds a prompt-level instruction and a grading flag that together catch "renders fine, but user CSS never loaded" failures. - pattern-copy-play.md: new Step 7 requires exactly one story to assert a component-specific computed style via getComputedStyle. - grade.ts: records hasComputedStyleAssertion based on whether the staged diff contains "getComputedStyle" (reuses the existing cached diff, no extra file reads). Chose the prompt+diff approach over a runtime stylesheet heuristic (filtering document.styleSheets, isolated all:initial probe, etc.) because: - The agent already knows what "styled correctly" means for a given component; a component-specific computed-style assertion catches the real failure ("bg-blue-600 did not apply") rather than a generic "something was applied" signal. - No fragile filtering of vitest-browser / storybook / addon stylesheet sources. Addons keep shipping new sheets; that filter would bit-rot. - Failures surface as normal Vitest assertion failures and already flow through pass/fail grading — no new counter, no new warning channel, no changes to render-analysis. - Complementary to a future runtime heuristic if we want one: prompt-level catches "agent misconfigured the design system"; runtime catches "agent shipped a visibly unstyled story without the check".
by Kasper Peulen
K
Failed
34595
daf25b59 Eval: require a getComputedStyle assertion in stories to prove CSS is loaded Addresses #34594. Adds a prompt-level instruction and a grading flag that together catch "renders fine, but user CSS never loaded" failures. - pattern-copy-play.md: new Step 7 requires exactly one story to assert a component-specific computed style via getComputedStyle. - grade.ts: records hasComputedStyleAssertion based on whether the staged diff contains "getComputedStyle" (reuses the existing cached diff, no extra file reads). Chose the prompt+diff approach over a runtime stylesheet heuristic (filtering document.styleSheets, isolated all:initial probe, etc.) because: - The agent already knows what "styled correctly" means for a given component; a component-specific computed-style assertion catches the real failure ("bg-blue-600 did not apply") rather than a generic "something was applied" signal. - No fragile filtering of vitest-browser / storybook / addon stylesheet sources. Addons keep shipping new sheets; that filter would bit-rot. - Failures surface as normal Vitest assertion failures and already flow through pass/fail grading — no new counter, no new warning channel, no changes to render-analysis. - Complementary to a future runtime heuristic if we want one: prompt-level catches "agent misconfigured the design system"; runtime catches "agent shipped a visibly unstyled story without the check".
by Kasper Peulen
K
Succeeded
34595
daf25b59 Eval: require a getComputedStyle assertion in stories to prove CSS is loaded Addresses #34594. Adds a prompt-level instruction and a grading flag that together catch "renders fine, but user CSS never loaded" failures. - pattern-copy-play.md: new Step 7 requires exactly one story to assert a component-specific computed style via getComputedStyle. - grade.ts: records hasComputedStyleAssertion based on whether the staged diff contains "getComputedStyle" (reuses the existing cached diff, no extra file reads). Chose the prompt+diff approach over a runtime stylesheet heuristic (filtering document.styleSheets, isolated all:initial probe, etc.) because: - The agent already knows what "styled correctly" means for a given component; a component-specific computed-style assertion catches the real failure ("bg-blue-600 did not apply") rather than a generic "something was applied" signal. - No fragile filtering of vitest-browser / storybook / addon stylesheet sources. Addons keep shipping new sheets; that filter would bit-rot. - Failures surface as normal Vitest assertion failures and already flow through pass/fail grading — no new counter, no new warning channel, no changes to render-analysis. - Complementary to a future runtime heuristic if we want one: prompt-level catches "agent misconfigured the design system"; runtime catches "agent shipped a visibly unstyled story without the check".
by Kasper Peulen
K
Succeeded
34595
daf25b59 Eval: require a getComputedStyle assertion in stories to prove CSS is loaded Addresses #34594. Adds a prompt-level instruction and a grading flag that together catch "renders fine, but user CSS never loaded" failures. - pattern-copy-play.md: new Step 7 requires exactly one story to assert a component-specific computed style via getComputedStyle. - grade.ts: records hasComputedStyleAssertion based on whether the staged diff contains "getComputedStyle" (reuses the existing cached diff, no extra file reads). Chose the prompt+diff approach over a runtime stylesheet heuristic (filtering document.styleSheets, isolated all:initial probe, etc.) because: - The agent already knows what "styled correctly" means for a given component; a component-specific computed-style assertion catches the real failure ("bg-blue-600 did not apply") rather than a generic "something was applied" signal. - No fragile filtering of vitest-browser / storybook / addon stylesheet sources. Addons keep shipping new sheets; that filter would bit-rot. - Failures surface as normal Vitest assertion failures and already flow through pass/fail grading — no new counter, no new warning channel, no changes to render-analysis. - Complementary to a future runtime heuristic if we want one: prompt-level catches "agent misconfigured the design system"; runtime catches "agent shipped a visibly unstyled story without the check".
by Kasper Peulen
K
Previous page
Previous
Next
Next page