Monday, 15 September 2014

filereader - Removing the BOM character with Java -



filereader - Removing the BOM character with Java -

this question has reply here:

byte order mark screws file reading in java 7 answers

i trying read files using filereader , write them separate file. these files utf-8 encoded, unfortuantely of them still contain bom. relevant code tried this:

private final string utf8_bom = "\ufeff"; private string removeutf8bom(string s) { if (s.startswith(utf8_bom)) { s=s.replace(utf8_bom, ""); } homecoming s; } line=removeutf8bom(line);

but reason bom not removed. there other way can filereader? know there bominputstream should work, i'd rather find solution using filereader.

naive solution question asked: public static void main(final string[] args) { final string hasbom = "\ufeff" + "hello world!"; final string nobom = hasbom.charat(0) == '\ufeff' ? hasbom.substring(1) : hasbom; system.out.println(hasbom.equals(nobom)); } outputs: false proper solution approach:

you should never programme file based api , instead programme against inputstream/outputstream code portable different source locations.

this untested illustration of how might go encapsulating behavior inputstream create transparent.

public class bomproofinputstream extends inputstream { private final inputstream is; public bomproofinputstream(@nonnull final inputstream is) { this.is = is; } private boolean isfirstbyte = true; @override public int read() throws ioexception { if (this.isfirstbyte) { this.isfirstbyte = false; final int b = is.read(); if ("\ufeff".charat(0) != b) { homecoming b; } } homecoming is.read(); } } found full fledged example searching:

java filereader byte-order-mark

No comments:

Post a Comment