Python module UrlParse – Improper input validation leads to Open Redirect

Title : Python module UrlParse – Improper input validation leads to Open Redirect
Credit : Yassine ABOUKIR
CVE : CVE-2015-2104 (Reserved)
Disclosure Date : 02/24/2015
Vendor : Python Software Foundation (https://www.python.org)
Affected versions : Python 2.7/3.2/3.3/3.4/3.5/3.6
CVSS Score : 4.3 (Medium)

Urlparse module Overview :
URLparse defines a standard interface to break URL strings up in components (addressing scheme, network location, path etc.).

»> from urlparse import urlparse, urlunparse
»> urlparse(“https://www.example.com/connect/login”)
ParseResult(scheme=’http’, netloc=’www.example.com’, path=’/connect/login’, params=”, query=”, fragment=”)

urlparse.urlunparse() : this function combines the components of a URL returned by the urlparse() function back to form the original URL.

Vulnerability description and Impact :
The module urlparse lacks proper validation of the input leading to open redirect vulnerability.
URLs do not survive the round-trip through  `urlunparse(urlparse(url))`. Python sees `////foo.com` as a URL with no hostname or scheme and a path of `//foo.com`, but when it reconstructs the URL after parsing, it becomes `//foo.com`.

Consequently, this bug can be practically exploited this way : https://www.example.com/login?next=////evil.com

The user may be subjected to phishing attacks by being redirected to an untrusted and attacker controlled web page that appears to be a trusted web site. The phishers may then steal the user’s credentials and then use these credentials to access the legitimate web site. Because the server name in the modified link is identical to the original site, phishing attempts have a more trustworthy appearance.

Proof Of Concept :

>>> x = urlparse("////evil.com")

///evil.com will be parsed as relative-path URL which is the correct expected behaviour

>>> print x
>>> ParseResult(scheme=", netloc=", path=’//evil.com’, params=", query=", fragment=")

As you see two slashes are removed and it is marked as a relative-path URL but when we reconstruct the URL using urlunparse() function, the URL is treated as an absolute path.

>>> x = urlunparse(urlparse("////evil.com"))
>>> urlparse(x)
ParseResult(scheme=", netloc=’evil.com’, path=", params=", query=", fragment=")

This vulnerability can be practically exploited this way : https://www.example.com/login?next=////evil.com

</div>
Mitigation :
This can be mitigated by checking if the path starts with double slashes and the URL encoding the two leading double slashes. Otherwise, it is recommanded to not use urlunparse(urlparse(url)) to validate a url.
References:
https://docs.python.org/2/library/urlparse.html
https://bugs.python.org/issue23505
https://github.com/reddit/reddit/commit/689a9554e60c6e403528b5d62a072ac4a072a20e
Greetz : Thanks to Reddit.com security team for their precious collaboration and help.
{: .text-justify}

Written on March 3, 2015