My Blog: python - urllib2 retrieve an arbitrary file based on URL and save it into a named file -

Wednesday, 15 January 2014

python - urllib2 retrieve an arbitrary file based on URL and save it into a named file -

i writing python script utilize urllib2 module equivalent command line utility wget. function want can used retrieve arbitrary file based on url , save named file. need worry 2 command line arguments, url file downloaded , name of file content saved.

example:

python prog7.py www.python.org pythonhomepage.html

this code:

import urllib import urllib2 #import requests  url = 'http://www.python.org/pythonhomepage.html'  print "downloading urllib" urllib.urlretrieve(url, "code.txt")  print "downloading urllib2" f = urllib2.urlopen(url)   info = f.read() open("code2.txt", "wb") code:    code.write(data)

urllib seems work urllib2 not seem work.

errors received:

 file "problem7.py", line 11, in <module>     f = urllib2.urlopen(url)   file "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen      homecoming _opener.open(url, data, timeout)   file "/usr/lib64/python2.6/urllib2.py", line 397, in open     response = meth(req, response)   file "/usr/lib64/python2.6/urllib2.py", line 510, in http_response     'http', request, response, code, msg, hdrs)   file "/usr/lib64/python2.6/urllib2.py", line 429, in error     result = self._call_chain(*args)   file "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain     result = func(*args)   file "/usr/lib64/python2.6/urllib2.py", line 616, in http_error_302      homecoming self.parent.open(new, timeout=req.timeout)   file "/usr/lib64/python2.6/urllib2.py", line 397, in open     response = meth(req, response)   file "/usr/lib64/python2.6/urllib2.py", line 510, in http_response     'http', request, response, code, msg, hdrs)   file "/usr/lib64/python2.6/urllib2.py", line 435, in error      homecoming self._call_chain(*args)   file "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain     result = func(*args)   file "/usr/lib64/python2.6/urllib2.py", line 518, in http_error_default     raise httperror(req.get_full_url(), code, msg, hdrs, fp) urllib2.httperror: http error 404: not found

it due different behavior urllib , urllib2. since web page returns 404 error (webpage not found) urllib2 "catches" while urllib downloads html of returned page regardless of error. if want print html text file can print error:

import urllib2 try:       info = urllib2.urlopen('http://www.python.org/pythonhomepage.html').read() except urllib2.httperror, e:     print e.code     print e.msg     print e.headers     print e.fp.read()     open("code2.txt", "wb") code:       code.write(e.fp.read())

req request object, fp file-like object http error body, code three-digit code of error, msg user-visible explanation of code , hdrs mapping object headers of error.

more info http error: urllib2 documentation

python

My Blog

Wednesday, 15 January 2014

python - urllib2 retrieve an arbitrary file based on URL and save it into a named file -

No comments:

Post a Comment