Python’s urlparse module breaks down URLs in to components. It supports all the URL schemes specified in RFC 3986 1.

>>> import urlparse
>>>
>>> urlparse.urlparse("https://example.com/foo?param=bar")
ParseResult(scheme='https', netloc='example.com', path='/foo', params='', query='param=bar', fragment='')
>>>
>>> urlparse.urlparse("file://example.com/etc/fstab")
ParseResult(scheme='file', netloc='example.com', path='/etc/fstab', params='', query='', fragment='')
>>>
>>> urlparse.urlparse("news:comp.infosystems.www.misc")
ParseResult(scheme='news', netloc='', path='comp.infosystems.www.misc', params='', query='', fragment='')

If you have questions or comments about this blog post, you can get in touch with me on Twitter @sdqali.

If you liked this post, you'll also like...