python - Recognize byte strings -
i'm trying figure out if possible recognize different strings in byte array. allow me elaborate, example:
b'libgcj-13.dll\x00_jv_registerclasses\x00\x00\x00unhandled vlc exception\n\x00\x00w\x00,\x00 \x00c\x00c\x00s\x00=\x00u\x00t\x00f\x00-\x008\x00\x00\x00\nerror while opening file\x00\x00\x00[\x00v\x00e\x00r\x00s\x00i\x00o\x00n\x00]\x00\n\x00o\x00s\x00=\x00%\x00d\x00.\x00%\x00' this little part obvious reasons i'll post little part. except byte array strings formatted in either 'ascii' or 'unicode'. eye pretty easy recognize different strings in array.
libgcj-13.dll followed null terminator - ascii _jv_registerclasses followed null terminator - ascii unhandled vlc exception followed null terminator -ascii w, css=utf-8 followed null terminator - unicode now problem starts when read byte array per byte. first 3 strings not produce problems until reach first unicode string. when read per byte receive "\x00w" "\x00" , "w" separated. "\x00" isn't null terminator part of "\x00w". next list create clear:
b'\x00', b'\x00', b'w', b'\x00', b',', b'\x00', b' ', b'\x00', b'c', b'\x00', b'c', b'\x00', b's', b'\x00', b'=', b'\x00', b'u', b'\x00', b't', b'\x00', b'f', b'\x00', b'-', b'\x00', b'8', b'\x00' from point receiving unicode characters should read per 2 bytes instead of 1 don't read them null terminator unicode characters instead. question there way observe this? or solve problem otherwise?
hopefully explained enough, kinds regards
python arrays unicode ascii
No comments:
Post a Comment