Conversation
…sanitizer Protocol-relative URLs (//hostname/path) were not matched by either sanitizeUrlProtocols() (which requires an explicit scheme://) or sanitizeUrlDomains() (which was anchored to https://). Browsers on HTTPS pages resolve //hostname/path to https://hostname/path, so these URLs must be subject to the same domain allowlist check. Extends sanitizeUrlDomains() with a second regex pass for //hostname/path patterns. A negative lookbehind (?<![:/]) prevents the new pattern from matching the // inside https://. The shared applyDomainFilter() helper is used by both the existing https:// pass and the new protocol-relative pass, ensuring identical allow/redact logic. Fixes: //evil.com/steal, [click](//evil.com/steal), and  are now redacted to (evil.com/redacted). Adds 9 regression tests covering plain text, markdown links, image embeds, allowed/disallowed domains, custom allowed domains, mixed content, path + query strings, and logging." Agent-Logs-Url: https://github.com/github/gh-aw/sessions/0cad179d-8061-4756-945e-9233c95fdae7 Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a sanitizer bypass where protocol-relative URLs (//host/path) were not subject to domain allowlisting, allowing disallowed domains to pass unchanged (including in markdown links/images).
Changes:
- Refactors
sanitizeUrlDomains()to share allowlist/redaction logic via an inner helper. - Adds a second domain-filtering pass to redact protocol-relative URLs (
//hostname/...). - Adds regression tests covering protocol-relative URLs in plain text, markdown links, and image embeds.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| actions/setup/js/sanitize_content_core.cjs | Adds protocol-relative URL domain filtering using a second regex pass and shared allowlist logic. |
| actions/setup/js/sanitize_content.test.cjs | Adds regression tests ensuring protocol-relative URLs are redacted/allowed appropriately and logged. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // The negative lookbehind (?<![:/]) avoids matching the // inside https://. | ||
| // The path stop-condition (?!\/\/) stops before the next protocol-relative URL | ||
| // (analogous to how the httpsUrlRegex stops before the next https:// URL). | ||
| // The path capture group is present for structural symmetry with httpsUrlRegex | ||
| // but is not needed by applyDomainFilter — the full match is used instead. | ||
| const protoRelativeUrlRegex = /(?<![:/])\/\/([\w.-]+(?::\d+)?)(\/(?:(?!\/\/)[^\s,])*)?/gi; | ||
|
|
||
| s = s.replace(protoRelativeUrlRegex, (match, hostnameWithPort) => applyDomainFilter(match, hostnameWithPort)); |
There was a problem hiding this comment.
The protocol-relative regex will also match "//" sequences that occur inside the path of an allowed https:// URL (because the lookbehind only excludes ':' and '/'). For example, in "https://github.com//issues", the "//issues" part is preceded by "m" and will be treated as a protocol-relative URL, potentially redacting/altering an otherwise-allowed URL. Consider tightening the precondition so protocol-relative matches only occur at the start of the string or after clear URL delimiters (whitespace, '(', '[', quotes, etc.), rather than after hostname/path characters.
| // The negative lookbehind (?<![:/]) avoids matching the // inside https://. | |
| // The path stop-condition (?!\/\/) stops before the next protocol-relative URL | |
| // (analogous to how the httpsUrlRegex stops before the next https:// URL). | |
| // The path capture group is present for structural symmetry with httpsUrlRegex | |
| // but is not needed by applyDomainFilter — the full match is used instead. | |
| const protoRelativeUrlRegex = /(?<![:/])\/\/([\w.-]+(?::\d+)?)(\/(?:(?!\/\/)[^\s,])*)?/gi; | |
| s = s.replace(protoRelativeUrlRegex, (match, hostnameWithPort) => applyDomainFilter(match, hostnameWithPort)); | |
| // We only treat // as a protocol-relative URL when it appears at the start of | |
| // the string or immediately after a clear delimiter (whitespace, brackets, | |
| // or quotes). This avoids matching // segments inside the path of an allowed | |
| // https:// URL, such as "https://github.com//issues". | |
| // The path stop-condition (?!\/\/) stops before the next protocol-relative URL | |
| // (analogous to how the httpsUrlRegex stops before the next https:// URL). | |
| // Capture groups: | |
| // 1: prefix (start-of-string or delimiter) | |
| // 2: full protocol-relative URL (starting with //) | |
| // 3: hostname (and optional port) | |
| // 4: optional path | |
| const protoRelativeUrlRegex = /(^|[\s([{"'])(\/\/([\w.-]+(?::\d+)?)(\/(?:(?!\/\/)[^\s,])*)?)/gi; | |
| s = s.replace( | |
| protoRelativeUrlRegex, | |
| (match, prefix, url, hostnameWithPort) => prefix + applyDomainFilter(url, hostnameWithPort) | |
| ); |
| expect(result).toContain("(evil.com/redacted)"); | ||
| expect(result).not.toContain("//evil.com"); | ||
| }); | ||
|
|
There was a problem hiding this comment.
Consider adding a regression test for an allowed https:// URL that contains a double-slash in the path (e.g., "https://github.com//issues") to ensure the new protocol-relative pass doesn’t accidentally treat path "//..." segments as protocol-relative URLs and redact them.
| it("should not treat double slashes in https URL paths as protocol-relative URLs", () => { | |
| const result = sanitizeContent("https://github.com//issues and //evil.com/path"); | |
| expect(result).toContain("https://github.com//issues"); | |
| expect(result).toContain("(evil.com/redacted)"); | |
| expect(result).not.toContain("//evil.com"); | |
| }); |
|
@copilot review comments |
|
@copilot review comments |
|
@copilot review comments |
… lookbehind Agent-Logs-Url: https://github.com/github/gh-aw/sessions/faa1705f-42c1-4f84-9165-6067c54c13e4 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed both review comments in ae6a1b8:
|
|
@copilot add more positive and negative test cases |
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/9aefc8a2-2f5b-4c63-82d8-3bf825bd7468 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Added 8 more tests in d03fdc8: Negative (redacted):
Positive (allowed):
Boundary (not treated as protocol-relative):
|
sanitizeUrlDomains()was anchored tohttps://andsanitizeUrlProtocols()required an explicit scheme, so//evil.com/pathpassed both filters unchanged. Browsers on HTTPS pages resolve//hostname/path→https://hostname/path, fully bypassing the domain allowlist — including in image embeds where camo proxy requests leak to the attacker.Changes
sanitize_content_core.cjs—sanitizeUrlDomains()applyDomainFilter(match, hostnameWithPort)(^|[\s([{"'])— matches//only at start-of-string or after clear URL delimiters (whitespace, brackets, quotes), avoiding false matches on//inside HTTPS URL paths (e.g.https://github.com//issues); stop-condition(?!\/\/)is symmetric with the existinghttps://stop-conditionsanitize_content.test.cjs"protocol-relative URL sanitization"describe block with 10 regression tests: plain text, markdown links ([text](//evil.com/path)), image embeds (), allowed domains, subdomains, custom domain env var, mixed content, path+query, redaction logging, and a regression case ensuringhttps://URLs with double-slash paths (e.g.https://github.com//issues) are not incorrectly redacted