Thursday, 15 August 2013

How to let imported python modules use a builtin hash() function which behaves the same between 32 and 64 bit? -



How to let imported python modules use a builtin hash() function which behaves the same between 32 and 64 bit? -

the next python snippet behaves differently depending on whether code run on 32bit or 64bit architectures:

class="lang-sh prettyprint-override">pythonhashseed=0 python3 -c 'print(hash("a"))'

on 32bit architectures prints -845962679 while on 64bit architectures prints -7583489610679606711.

this in turn means, when setting pythonhashseed=0, order of dictionary keys illustration depends on architecture , deterministic within 32bit or 64bit architectures, respectively.

this problem can worked around either making sure output sorted or monkeypatching hash function this:

class="lang-py prettyprint-override">oldhash = __builtins__.hash __builtins__.hash = lambda x: oldhash(x) & 0xffffffff

unfortunately either of these workarounds not work when 1 uses external library. in case want utilize networkx module while networkx maintainers might solving issue @ point (either providing sorted output or patching hash function) might decide not or prepare problem in far future. plus, networkx not module producing output depends on output of hash function , i'd find solution prepare them right without having wait external projects or having carry local patches.

so question boils downwards to: possible modify python hash function used modules import?

or there solution lets me utilize modules produce output depends on hash function in way such deterministic between 32bit , 64bit architectures?

edit: suggested i'm making reply more specific. i'm choosing networkx illustration demonstrate problem. i'd want general solution can apply other modules import well.

consider next graph test.xml in graphml:

class="lang-xml prettyprint-override"><graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <graph edgedefault="directed"> <node id="n1616"/> <node id="n48"/> <node id="n3637"/> <node id="n2842"/> <node id="n2530"/> <node id="n2396"/> <node id="n6453"/> <node id="n278"/> <node id="n1209"/> <node id="n92"/> <node id="n793"/> <node id="n3631"/> <node id="n341"/> <node id="n3151"/> <node id="n1717"/> <node id="n890"/> <node id="n11399"/> <node id="n203"/> <node id="n1928"/> <node id="n555"/> <node id="n156"/> <node id="n553"/> <node id="n2524"/> <node id="n3396"/> <node id="n1741"/> <node id="n4117"/> <node id="n959"/> <node id="n1667"/> <node id="n6489"/> <node id="n4973"/> <node id="n2247"/> <node id="n927"/> <node id="n1211"/> <node id="n5467"/> <node id="n450"/> <node id="n1727"/> <node id="n3531"/> <node id="n6357"/> <node id="n317"/> <node id="n37"/> <node id="n14349"/> <node id="n1530"/> <node id="n12429"/> <node id="n249"/> <node id="n348"/> <node id="n3285"/> <node id="n2518"/> <node id="n406"/> <node id="n2034"/> <node id="n2855"/> <node id="n6"/> <node id="n4742"/> <node id="n125"/> <node id="n281"/> <node id="n44"/> <node id="n924"/> <node id="n926"/> <node id="n251"/> <node id="n5455"/> <node id="n666"/> <node id="n3112"/> <node id="n2870"/> <node id="n6452"/> <node id="n3156"/> <node id="n2299"/> <node id="n416"/> <node id="n4556"/> <node id="n1832"/> <node id="n89"/> <node id="n2342"/> <node id="n1327"/> <node id="n1333"/> <node id="n542"/> <node id="n674"/> <node id="n47"/> <node id="n1174"/> <node id="n102"/> <node id="n1570"/> <node id="n1362"/> <node id="n9721"/> <node id="n789"/> <node id="n270"/> <node id="n1524"/> <node id="n4616"/> <node id="n6093"/> <node id="n2386"/> </graph> </graphml>

then next produces different output depending on architecture:

pythonhashseed=0 python3 -c "import networkx, sys; g = networkx.read_graphml('test.xml'); networkx.write_graphml(g, sys.stdout.buffer)" | md5sum

though instead should same. i'm not able come more minimal illustration because seems if remove node xml input above, output becomes same. don't know why happens.

it help if networkx did not rely on dictionary order output instead allowed sort output when brought , filed https://github.com/networkx/networkx/issues/1181 bug closed after give-and-take without accepting patch. in meanwhile fixed part of problem pythonhashseed=0 not prepare problem of different output between 32bit , 64bit architectures.

python hash 32bit-64bit monkeypatching

No comments:

Post a Comment