Abusing true-case URL parsing

The web has evolved significantly over the past few years and logics for parsing URLs are put to test heavily in critical areas like SSO functionalities and others. Parsing and validating URLs correctly is therefore a high concern in security but what happens when their correct parsing behaviour is what would lead to critical security issues in them?

In this paper, I am defining `true-case URL parsing` as the parsing behaviour which is correct and thus true-case URL parsing abuse means defining security concerns in web apps that correctly parse URLs for deciding some security-related logics, but also at the same time, unintentionally aid in creating vulnerabilities as they do so.

HOST-and-Origin confusions:

What do you think is the host and origin of :


Easy, right? It’s huntingreads.com.

Lets level-up:


While it may look like it is huntingeads.com for the novice in security, but, we know it’s example.com since anything before “@” (apart from some exceptions) is considered as the username/password section of the URL.

OK so what is the host and origin of:


In this case, it is huntingreads.com and not example.com because a forward-slash ends a host any anything after, even the @ would not be considered a value in host but in path.

Now, what would be the correct origin and host for a URL that is:


Browsers auto-correct backward-slashes to forward-slahes (even in the Omnibox even in the username/password section). Therefore, if you copy the above URL and paste it in URL bar, it would open huntingreads.com as the backward-slash would be converted to forward-slash. This is where things get interesting.

The browser will take you to huntingreads.com if you open it but what is a valid origin of the above URL?

The answer is "example.com"

new URL('https://huntingreads.com\@example.com')
URL {origin: "https://example.com", protocol: "https:", username: "huntingreads.com", password: "", host: "example.com", …}
hash: ""
host: "example.com"
hostname: "example.com"
href: "https://huntingreads.com@example.com/"
origin: "https://example.com"
password: ""
pathname: "/"
port: ""
protocol: "https:"
search: ""
searchParams: URLSearchParams {}
username: "huntingreads.com"
__proto__: URL

Browsers do correct a backward slash to a forward slash but in many languages like JavaScript, a backward slash gets considered an escape and not a literal back-slash. If we do not to pass a literal backward-slash (which will be converted to front-slash later), we need to escape the backward-slash itself. Like so:

new URL('https://huntingreads.com\\@example.com')
URL {origin: "https://huntingreads.com", protocol: "https:", username: "", password: "", host: "huntingreads.com", …}
hash: ""
host: "huntingreads.com"
hostname: "huntingreads.com"
href: "https://huntingreads.com/@example.com"
origin: "https://huntingreads.com"
password: ""
pathname: "/@example.com"
port: ""
protocol: "https:"
search: ""
searchParams: URLSearchParams {}
username: ""
__proto__: URL

This time, the origin and host will be huntingreads.com


A URL like this: https://huntingreads.com\@example.com; has an origin of example.com. Any web app using a language like JS would and should treat it as so but if the host value is valued for any security sensitive action, it will make spaces for vulnerabilities for example, deciding to pass authentications tokens to origin/host values in a parsed URLs. This example can have a following attack scenario:

  • Languages like JS, will validate the origin/host of: https://evil.com\@trusted.com as trusted.com
  • auth tokens will be sent if the origin/host is trusted.com to the full URL like:
  • This will result into auth token theft since it wouldn’t be actually sent to trusted.com

Other attack resulting from this behaviour is likely to exist.