Wednesday, 15 January 2014

python - urllib2 retrieve an arbitrary file based on URL and save it into a named file -



python - urllib2 retrieve an arbitrary file based on URL and save it into a named file -

i writing python script utilize urllib2 module equivalent command line utility wget. function want can used retrieve arbitrary file based on url , save named file. need worry 2 command line arguments, url file downloaded , name of file content saved.

example:

python prog7.py www.python.org pythonhomepage.html

this code:

import urllib import urllib2 #import requests url = 'http://www.python.org/pythonhomepage.html' print "downloading urllib" urllib.urlretrieve(url, "code.txt") print "downloading urllib2" f = urllib2.urlopen(url) info = f.read() open("code2.txt", "wb") code: code.write(data)

urllib seems work urllib2 not seem work.

errors received:

file "problem7.py", line 11, in <module> f = urllib2.urlopen(url) file "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen homecoming _opener.open(url, data, timeout) file "/usr/lib64/python2.6/urllib2.py", line 397, in open response = meth(req, response) file "/usr/lib64/python2.6/urllib2.py", line 510, in http_response 'http', request, response, code, msg, hdrs) file "/usr/lib64/python2.6/urllib2.py", line 429, in error result = self._call_chain(*args) file "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) file "/usr/lib64/python2.6/urllib2.py", line 616, in http_error_302 homecoming self.parent.open(new, timeout=req.timeout) file "/usr/lib64/python2.6/urllib2.py", line 397, in open response = meth(req, response) file "/usr/lib64/python2.6/urllib2.py", line 510, in http_response 'http', request, response, code, msg, hdrs) file "/usr/lib64/python2.6/urllib2.py", line 435, in error homecoming self._call_chain(*args) file "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) file "/usr/lib64/python2.6/urllib2.py", line 518, in http_error_default raise httperror(req.get_full_url(), code, msg, hdrs, fp) urllib2.httperror: http error 404: not found

it due different behavior urllib , urllib2. since web page returns 404 error (webpage not found) urllib2 "catches" while urllib downloads html of returned page regardless of error. if want print html text file can print error:

import urllib2 try: info = urllib2.urlopen('http://www.python.org/pythonhomepage.html').read() except urllib2.httperror, e: print e.code print e.msg print e.headers print e.fp.read() open("code2.txt", "wb") code: code.write(e.fp.read())

req request object, fp file-like object http error body, code three-digit code of error, msg user-visible explanation of code , hdrs mapping object headers of error.

more info http error: urllib2 documentation

python

No comments:

Post a Comment