Hacker & Security Analyst at HackerOne

Python module UrlParse – Improper input validation leads to Open Redirect

mars 3, 2015Yassine ABOUKIR2 Comments

Title : Python module UrlParse – Improper input validation leads to Open Redirect
Credit : Yassine ABOUKIR
CVE : CVE-2015-2104 (Reserved)
Disclosure Date : 02/24/2015
Vendor : Python Software Foundation (http://www.python.org)
Affected versions : Python 2.7/3.2/3.3/3.4/3.5/3.6
CVSS Score : 4.3 (Medium)

Urlparse module Overview :
URLparse defines a standard interface to break URL strings up in components (addressing scheme, network location, path etc.).

>>> from urlparse import urlparse, urlunparse
>>> urlparse(« http://www.example.com/connect/login »)
ParseResult(scheme=’http’, netloc=’www.example.com’, path=’/connect/login’, params= », query= », fragment= »)

urlparse.urlunparse() : this function combines the components of a URL returned by the urlparse() function back to form the original URL.

Vulnerability description and Impact :
The module urlparse lacks proper validation of the input leading to open redirect vulnerability.
URLs do not survive the round-trip through  urlunparse(urlparse(url)). Python sees ////foo.com as a URL with no hostname or scheme and a path of //foo.com, but when it reconstructs the URL after parsing, it becomes //foo.com.

Consequently, this bug can be practically exploited this way : http://www.example.com/login?next=////evil.com

The user may be subjected to phishing attacks by being redirected to an untrusted and attacker controlled web page that appears to be a trusted web site. The phishers may then steal the user’s credentials and then use these credentials to access the legitimate web site. Because the server name in the modified link is identical to the original site, phishing attempts have a more trustworthy appearance.

Proof Of Concept :

>>> x = urlparse(« ////evil.com »)

///evil.com will be parsed as relative-path URL which is the correct expected behaviour

>>> print x
>>> ParseResult(scheme= », netloc= », path=’//evil.com’, params= », query= », fragment= »)

As you see two slashes are removed and it is marked as a relative-path URL but when we reconstruct the URL using urlunparse() function, the URL is treated as an absolute path.

>>> x = urlunparse(urlparse(« ////evil.com »))
>>> urlparse(x)
ParseResult(scheme= », netloc=’evil.com’, path= », params= », query= », fragment= »)

This vulnerability can be practically exploited this way : https://www.example.com/login?next=////evil.com

Mitigation :
 This can be mitigated by checking if the path starts with double slashes and the URL encoding the two leading double slashes. Otherwise, it is recommanded to not use urlunparse(urlparse(url)) to validate a url.
Greetz : Thanks to Reddit.com security team for their precious collaboration and help.
This article has 2 comments
  1. Albert S
    28 mai 2015

    I think this is already mentioned by homakov here :

    • Yassine ABOUKIR
      26 juin 2015

      Egor, in the article, invites us to use Libraries instead in order to mitigate the issue. But, some libraries are not that secured as it seems. Pyhthon, for example, fails to reconstruct the url as I described above leading to an open redirect.

Leave a Reply

Prove you are not a Bot * Le temps imparti est dépassé. Merci de saisir de nouveau le CAPTCHA.