Implement copilot quality alt#38
Conversation
There was a problem hiding this comment.
⚠️ Not ready to approve
There are correctness and reliability issues in new code paths (notably link-context wording in prompts and loadImageAsDataUrl handling of data: URLs / timeouts) that should be addressed before merging.
Pull request overview
Adds a new opt-in, model-backed alt-text-quality rule to evaluate whether an image’s alt text actually describes the image (using GitHub Models, optionally enriched by an Azure AI Vision pre-pass), and extends image extraction to capture additional context for higher-quality judging.
Changes:
- Introduces the async, opt-in
alt-text-qualityrule plus a new judge layer (Copilot judge, optional Azure-augmented judge, caching). - Enhances
extractImagesandImageRecordwith intrinsic dimensions and page/section/prose context used by the model-backed rule. - Adds an offline probe harness + fixtures and updates docs/schema/tests to cover the new rule and async rule evaluation.
File summaries
| File | Description |
|---|---|
| tsconfig.json | Includes scripts/**/*.ts in typechecking. |
| tests/utils/helpers.ts | Adds ImageRecord defaults for new fields; adds sync-only rule evaluation helper. |
| tests/unit/vague-alt-text.test.ts | Updates typing to account for async-capable Rule.evaluate. |
| tests/unit/missing-alt-text.test.ts | Updates typing to account for async-capable Rule.evaluate. |
| tests/unit/judges-caching.test.ts | Adds unit tests for judge/vision caching behavior. |
| tests/unit/alt-text-quality.test.ts | Adds unit tests for the new alt-text-quality rule behavior. |
| tests/fixtures/alt-quality/cases.json | Adds offline probe fixture cases for expected verdicts. |
| tests/fixtures/alt-quality/cases-github.json | Adds GitHub-specific probe fixture cases. |
| tests/extract.test.ts | Adds tests for capturing pageTitle and sectionHeading. |
| tests/example-site.test.ts | Skips opt-in rules when asserting expected findings in the example site. |
| src/utils/load-image-data-url.ts | Adds helper to load images as data URLs for the rule and probe. |
| src/utils/fetch-with-retry.ts | Adds fetch wrapper with timeout and bounded retry for transient failures. |
| src/types.ts | Extends ImageRecord; makes Rule.evaluate async-capable. |
| src/rules/index.ts | Registers alt-text-quality in the append-only rules list. |
| src/rules/alt-text-quality.ts | Implements the async, opt-in model-backed rule orchestrator. |
| src/judges/types.ts | Defines judge interfaces/types shared across judge implementations. |
| src/judges/prompt.ts | Adds the system prompt and strict JSON schema for structured verdicts. |
| src/judges/index.ts | Adds createJudge() with mode selection and wiring (copilot vs azure-augmented + caching). |
| src/judges/copilot-judge.ts | Implements the GitHub Models-backed judge call. |
| src/judges/caching.ts | Adds content-hash caches for verdicts and Azure vision analysis. |
| src/judges/azure-vision-api-client.ts | Implements Azure AI Vision Image Analysis client. |
| src/judges/azure-augmented-judge.ts | Decorates the Copilot judge with optional Azure pre-analysis context. |
| src/extract.ts | Enhances extraction with nearby text, figcaption, link/button context, page title, and section heading. |
| scripts/probe-alt-quality.ts | Adds offline probe CLI for scoring fixture cases against the judge. |
| schema/config.schema.json | Adds alt-text-quality to config schema and updates schema description. |
| README.md | Documents the new opt-in rule, configuration, and required secrets. |
| package.json | Adds probe script and tsx dev dependency. |
| package-lock.json | Locks new dependencies (tsx, esbuild, etc.). |
| index.ts | Awaits rule evaluation to support async rules. |
| .gitignore | Ignores .env (used by the probe). |
| .github/workflows/scan-static-sites.yml | Adds job-level env wiring for model/vision credentials. |
Copilot's findings
- Files reviewed: 28/31 changed files
- Comments generated: 4
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
⚠️ Not ready to approve
It introduces a model call pathway that currently forwards raw outerHTML into prompts (risking sensitive URL/query leakage) and includes a few correctness/perf/test-safety issues that should be addressed before approval.
Copilot's findings
- Files reviewed: 29/32 changed files
- Comments generated: 6
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
There was a problem hiding this comment.
⚠️ Not ready to approve
The model-backed rule currently risks leaking sensitive URL data (e.g., link hrefs and logged image URLs with querystrings) into model context and CI logs, which should be redacted before merging.
Copilot's findings
- Files reviewed: 29/32 changed files
- Comments generated: 5
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
| function normalizeDataUrl(dataUrl: string): string { | ||
| const comma = dataUrl.indexOf(',') | ||
| if (comma === -1) return dataUrl | ||
| const meta = dataUrl.slice('data:'.length, comma) | ||
| if (/;base64/i.test(meta)) return dataUrl | ||
| const payload = dataUrl.slice(comma + 1) | ||
| let decoded: string | ||
| try { | ||
| decoded = decodeURIComponent(payload) | ||
| } catch { | ||
| decoded = payload | ||
| } | ||
| const mime = meta.split(';')[0] || 'text/plain' | ||
| return `data:${mime};base64,${Buffer.from(decoded, 'utf8').toString('base64')}` | ||
| } |
| if (image.inLink) parts.push(`The image is inside a link with href="${image.inLink.href}".`) | ||
| if (image.inButton) parts.push('The image is inside a button (or role="button" element).') |
There was a problem hiding this comment.
I'm gonna add a small redactUrl() helper and apply it here + the console.error lines in evaluate() as they all leak the same querystring stuff into CI logs. The ADR calls this out too. Opening a PR (#45) so we don't end up doing this in two places.
| } catch (err) { | ||
| console.error(`[alt-text-quality] failed to load ${resolved}:`, err) | ||
| continue | ||
| } |
| } catch (err) { | ||
| console.error(`[alt-text-quality] judge failed for ${resolved}:`, err) | ||
| continue | ||
| } |
| export async function fetchWithRetry( | ||
| url: string | URL, | ||
| init: RequestInit, | ||
| options: FetchRetryOptions = {}, | ||
| ): Promise<Response> { |
There was a problem hiding this comment.
+1, this one's worth doing before merge, it's in the path of every model and image call.
taarikashenafi
left a comment
There was a problem hiding this comment.
btw on doing the other stretch goals in this branch, might be easier to get this one in first and then branch the next thing off main? this one's already pretty big and the rules registry seems set up for adding one rule at a time, so keeping them split might make each easier to review since Joyce hasn't' started on this yet.
| if (image.inLink) parts.push(`The image is inside a link with href="${image.inLink.href}".`) | ||
| if (image.inButton) parts.push('The image is inside a button (or role="button" element).') |
There was a problem hiding this comment.
I'm gonna add a small redactUrl() helper and apply it here + the console.error lines in evaluate() as they all leak the same querystring stuff into CI logs. The ADR calls this out too. Opening a PR (#45) so we don't end up doing this in two places.
Yeah for sure, good call. |
…ion, doc/context fixes
…op unsafe test casts
9b5b167 to
5ae4bc1
Compare
| "rules": { | ||
| "type": "object", | ||
| "description": "Per-rule enable/disable overrides. Rules not listed keep their default behavior (today, every rule defaults to enabled).", | ||
| "description": "Per-rule enable/disable overrides. Rules not listed keep their default behavior (every rule defaults to enabled except alt-text-quality, which is opt-in).", |
There was a problem hiding this comment.
| "description": "Per-rule enable/disable overrides. Rules not listed keep their default behavior (every rule defaults to enabled except alt-text-quality, which is opt-in).", | |
| "description": "Per-rule enable/disable overrides. Rules not listed keep their default behavior (enabled with the exception of alt-text-quality).", |
| }, | ||
| "alt-text-quality": { | ||
| "type": "boolean", | ||
| "description": "Opt-in, model-backed. Sends each image and its alt text to a vision model (GitHub Models) to flag quality issues such as redundant prefixes, vague or context-missing descriptions, and medium-announcing link text. Disabled by default; requires GITHUB_MODELS_TOKEN and incurs per-image API cost." |
There was a problem hiding this comment.
what does "medium-announcing link text" mean in this context?
| // GITHUB_MODELS_TOKEN=<pat-with-models:read> npm run probe | ||
| // | ||
| // Optional env: | ||
| // PROBE_MODEL — model id, default "openai/gpt-4o" |
There was a problem hiding this comment.
Should we specify here this must be a vision-enabled model?
| console.log(` reason: ${verdict.reasoning}`) | ||
| console.log('') | ||
| } catch (err) { | ||
| console.log('ERROR') |
There was a problem hiding this comment.
do you want to console.error here to draw attention, or are these relatively common transient errors we might get per-case?
There was a problem hiding this comment.
it also could be helpful to log the case object's alt value for more human-friendly debugging.
|
|
||
| ### Alt-text quality (model-backed, opt-in) | ||
|
|
||
| The five rules above are deterministic pattern matches. `alt-text-quality` goes further: it sends each image and its alt text to a vision model, which judges whether the alt text actually and sufficiently describes the image. This catches plausible-looking but wrong or incomplete alt text — for example `alt="a person"` on a photo of a named individual. |
There was a problem hiding this comment.
| The five rules above are deterministic pattern matches. `alt-text-quality` goes further: it sends each image and its alt text to a vision model, which judges whether the alt text actually and sufficiently describes the image. This catches plausible-looking but wrong or incomplete alt text — for example `alt="a person"` on a photo of a named individual. | |
| The five rules above are deterministic pattern matches. `alt-text-quality` goes further: it sends each image, its alt text, and surrounding context in the DOM to a vision model, which judges whether the alt text actually and sufficiently describes the image. This catches plausible-looking but wrong or incomplete alt text — for example `alt="a person"` on a photo of an individual named in surrounding text. |
|
|
||
| // | ||
| // Run: | ||
| // GITHUB_MODELS_TOKEN=<pat-with-models:read> npm run probe |
There was a problem hiding this comment.
For consistency, it would be good to have this documenting comment also note you can use GH_TOKEN as an alternative.
| // | ||
| // Optional env: | ||
| // PROBE_MODEL — model id, default "openai/gpt-4o" | ||
| // PROBE_CASES — path to a cases.json |
There was a problem hiding this comment.
| // PROBE_CASES — path to a cases.json | |
| // PROBE_CASES — path to a cases.json file |
| // ALT_TEXT_JUDGE_MODE — force "copilot" or "azure-augmented". When unset, | ||
| // auto-selects azure-augmented if AZURE_VISION_* are set. | ||
| // PROBE_MIN_INTERVAL_MS — minimum ms between cases (rate-limit pacing). | ||
| // Set to 3500 for Azure F0's 20-calls/min ceiling. |
There was a problem hiding this comment.
link to documentation? or was this discovered via trial-and-error?
| } | ||
|
|
||
| async function main(): Promise<void> { | ||
| const here = dirname(fileURLToPath(import.meta.url)) |
There was a problem hiding this comment.
| const here = dirname(fileURLToPath(import.meta.url)) | |
| const currentDirectory = dirname(fileURLToPath(import.meta.url)) |
| ? resolve(process.cwd(), process.env['PROBE_CASES']) | ||
| : resolve(here, '..', 'tests', 'fixtures', 'alt-quality', 'cases.json') | ||
|
|
||
| const cases = JSON.parse(await readFile(casesPath, 'utf8')) as ProbeCase[] |
There was a problem hiding this comment.
Might want to gracefully error log and exit if you encounter an invalid JSON file; if someone tries to edit test cases and accidentally leaves out a comma or something similar, could happen!
| console.log(`Model: ${model}`) | ||
| console.log(`Cases: ${casesPath}`) | ||
| console.log(`Total: ${cases.length}`) | ||
| if (minIntervalMs > 0) console.log(`Pacing: ${minIntervalMs}ms minimum between cases`) |
There was a problem hiding this comment.
| if (minIntervalMs > 0) console.log(`Pacing: ${minIntervalMs}ms minimum between cases`) | |
| if (minIntervalMs > 0) console.log(`Pacing: ${minIntervalMs}ms minimum between cases\n`) |
| console.log(`Cases: ${casesPath}`) | ||
| console.log(`Total: ${cases.length}`) | ||
| if (minIntervalMs > 0) console.log(`Pacing: ${minIntervalMs}ms minimum between cases`) | ||
| console.log('') |
There was a problem hiding this comment.
actually, is this even affecting output?
| minImageDimension?: number | ||
| } | ||
|
|
||
| const DEFAULT_TAG_CONFIDENCE = 0.7 |
There was a problem hiding this comment.
just checking, range for this is 0-1, yes? I'm a bit rusty :)
| const DEFAULT_TAG_CONFIDENCE = 0.7 | ||
| const DEFAULT_MAX_TAGS = 8 | ||
| // Azure's documented hard minimum: images must be greater than 50x50 px. | ||
| const DEFAULT_MIN_IMAGE_DIMENSION = 50 |
There was a problem hiding this comment.
and this is both minimum height and minimum width?
Closes github/accessibility#10635
Adds an opt-in, model-backed
alt-text-qualityrule that judges whether an image's alt text actually describes the image — a gap the deterministic rules can't reach.Design & rationale: ADR