Sunday, 15 February 2015

JSON.loads() ValueError Extra Data in Python -



JSON.loads() ValueError Extra Data in Python -

i'm trying read individual values json feed. here illustration of feed data:

{ "sendtoken": "token1", "bytes_transferred": 0, "num_retries": 0, "timestamp": 1414395374, "queue_time": 975, "message": "internalerror", "id": "mailerx", "m0": { "binding_group": "domain.com", "recipient_domain": "hotmail.com", "recipient_local": "destination", "sender_domain": "domain.com", "binding": "mail.domain.com", "message_id": "c1/34-54876-d36fa645", "api_credential": "creds", "sender_local": "localstring" }, "rejecting_ip": "145.5.5.5", "type": "alpha", "message_stage": 3 } { "sendtoken": "token2", "bytes_transferred": 0, "num_retries": 0, "timestamp": 1414397568, "queue_time": 538, "message": "internal error, "id": "mailerx", "m0": { "binding_group": "domain.com", "recipient_domain": "hotmail.com", "recipient_local": "destination", "sender_domain": "domain.com", "binding": "mail.domain.com", "message_id": "c1/34-54876-d36fa645", "api_credential": "creds", "sender_local": "localstring" }, "rejecting_ip": "145.5.5.5", "type": "alpha", "message_stage": 3 }

i can't share actual url, above first 2 of 150 results displayed if run

print results

before the

json.loads()

line.

my code:

import urllib2 import json results = urllib2.urlopen(url).read() jsondata = json.loads(results) row in jsondata: print row['sendtoken'] print row['recipient_domain']

i'd output like

token1 hotmail.com

for each entry.

i'm getting error:

valueerror: data: line 2 column 1 - line 133 column 1 (char 583 - 77680)

i'm far python expert, , first time working json. i've spent quite bit of time looking on google , stack overflow, can't find solution works specific info format.

the problem info don't form json object, can't decode them json.loads.

first, appears sequence of json objects separated spaces. since won't tell info come from, educated guess; whatever documentation or coworker or whatever told url told format is. let's assume educated guess correct.

the easiest way parse stream of json objects in python utilize raw_decode method. this:*

import json def parse_json_stream(stream): decoder = json.jsondecoder() while stream: obj, idx = decoder.raw_decode(stream) yield obj stream = stream[idx:].lstrip()

however, there's error in sec json object in stream. @ part:

… "message": "internal error, "id": "mailerx", …

there's missing " after "internal error. if prepare that, function above iterate 2 json objects.

hopefully error caused trying manually "copy , paste" info rewriting it. if it's in original source data, you've got much bigger problem; need write "broken json" parser scratch can heuristically guess @ info intended be. or, of course, whoever's generating source generate properly.

* in general, it's more efficient utilize sec argument raw_decode pass start index, instead of slicing off re-create of remainder each time. raw_decode can't handle leading whitespace. it's little easier piece , strip write code skips on whitespace given index, if memory , performance costs of copies matter, should write more complicated code.

python json url urllib2

No comments:

Post a Comment