xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Clark <an...@apache.org>
Subject Re: going crazy with this: org.xml.sax.SAXParseException: Content is not allowed in prolog
Date Mon, 01 Aug 2005 04:05:09 GMT
Martin Vysny wrote:
> I had the same problem aswell. When you try to save file in
> notepad.exe as UTF-8, it places 3-byte invisible UTF-8 character at
> the start of xml file. That is causing that goddamn "Content is not
> allowed in prolog" message.

That's probably not the problem because Xerces has a custom
UTF-8 reader that knows how to skip the BOM. Unless, of
course, the application doesn't give the parser the chance
to pick the proper java.io.Reader for the input. This can
happen when the application constructs an input source with
a Reader object instead of an InputStream. For example:

   Reader reader = new InputStreamReader(stream);
   InputSource source = new InputSource(reader);

In this case, the input stream reader will use the default
system encoding, usually ISO Latin 1 on English systems. This
is normally ok because every byte (even with the high bit on)
is valid in that encoding. All except for the UTF-8 byte
order mark which ends up looking like "content [that] is
not allowed in [the] prolog". Even constructing an input
stream reader with the encoding set to "UTF-8" doesn't
help because that will use the Java UTF-8 reader which
doesn't understand the BOM.

Andy Clark * andyc@apache.org

To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

View raw message