xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Ekeland <ekel...@maths.fr>
Subject Re: going crazy with this: org.xml.sax.SAXParseException: Content is not allowed in prolog
Date Fri, 29 Jul 2005 09:52:35 GMT
Well, everything indicates there are no hidden characters in front of 
the beginning of the file. Either the "debug" command as you suggested 
(see results below), or parsing the first characters of the InputStream 
until the first '<', both point out that '<' is indeed the first 
character encountered.

Could it be possible it comes from the encoding of the file?
I "iso-8859-1"-ed everything possible though to make every aspect of the 
parsing coherent..
Where does the prolog start and end? Maybe the problem comes from the 
end of the prolog?
...

PS: the extract from debug.exe

0D49:0100  3C 3F 78 6D 6C 20 76 65-72 73 69 6F 6E 3D 22 31   <?xml 
version="1
0D49:0110  2E 30 22 20 65 6E 63 6F-64 69 6E 67 3D 22 69 73   .0" 
encoding="is
0D49:0120  6F 2D 38 38 35 39 2D 31-22 3F 3E 0A 0A 3C 21 44   
o-8859-1"?>..<!D
0D49:0130  4F 43 54 59 50 45 20 55-6E 69 74 2D 6F 66 2D 73   OCTYPE 
Unit-of-s
0D49:0140  74 75 64 79 0A 20 20 50-55 42 4C 49 43 20 22 2D   tudy.  
PUBLIC "-
0D49:0150  2F 2F 4F 55 4E 4C 2F 2F-44 54 44 20 45 4D 4C 2F   //OUNL//DTD 
EML/
0D49:0160  58 4D 4C 20 62 69 6E 64-69 6E 67 20 31 2E 30 2F   XML binding 
1.0/
0D49:0170  31 2E 30 2F 2F 45 4E 22-20 22 68 74 74 70 3A 2F   1.0//EN" 
"http:/

Robert Houben wrote:

>This may not be your problem, but I've wasted tons of time in the past
>because of these symptoms, so here is why it happened to me...
>
>I have seen this happen when a file is read that contains byte order
>marks at the beginning.  Most editors strip these out and get the
>encoding right, so you don't know this is happening.  If you are doing
>your own file reader to get an InputStream, you may need to skip a few
>bytes at the beginning, setting the encoding value correctly based on
>them, prior to setting up the reader. To tell if this is happening to
>you, on a windows system, use the debug.exe command from the command
>line:
>
>C:\>debug test.xml
>-d
>1480:0100  FF FE 3C 00 74 00 65 00-73 00 74 00 3E 00 74 00
>..<.t.e.s.t.>.t.
>1480:0110  65 00 73 00 74 00 3C 00-2F 00 74 00 65 00 73 00
>e.s.t.<./.t.e.s.
>1480:0120  74 00 3E 00 0D 00 0A 00-00 00 00 00 00 00 00 00
>t.>.............
>1480:0130  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>................
>1480:0140  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>................
>1480:0150  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>................
>1480:0160  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>................
>1480:0170  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>................
>-q
>
>C:\>
>
>Note that the file starts with "FFFE" which is a Unicode 16 Little
>Endian byte order mark (BOM).  If you create your own file reader and
>try to pull this in, you will encounter the error that you are
>mentioning.  Notepad will show this as normal text, you'll never see the
>funny stuff.
>
>HTH,
>
>-----Original Message-----
>From: Andy Clark [mailto:andyc@apache.org] 
>Sent: Wednesday, July 27, 2005 5:46 PM
>To: j-users@xerces.apache.org
>Subject: Re: going crazy with this: org.xml.sax.SAXParseException:
>Content is not allowed in prolog
>
>Paul Ekeland wrote:
>  
>
>>my problem is that I cannot see any whitespace/strange characters
>>before the root element of the document. I have used several
>>different hexadecimal editors to check that, with no success! Do you
>>have a different way to find out of the existence of such things?
>>    
>>
>
>Can you attach the first few lines of the file to a
>followup message? (Attach, not paste.)
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Mime
View raw message