xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pete Hendry <peter.hen...@capeclear.com>
Subject Re: going crazy with this: org.xml.sax.SAXParseException: Content is not allowed in prolog
Date Fri, 29 Jul 2005 10:18:41 GMT
My guess is that George is right and that this is resolving to html and 
not xml (probably an error page).

Pete

George Cristian Bina wrote:

> Hi Paul,
>
> My guess is that the problem is in the dtd file:
> http://127.0.0.1:8083/dtd/eml10.dtd
>
> Regards,
> George
> ---------------------------------------------------------------------
> George Cristian Bina
> <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
> http://www.oxygenxml.com
>
>
> Paul Ekeland wrote:
>
>> Well, everything indicates there are no hidden characters in front of 
>> the beginning of the file. Either the "debug" command as you 
>> suggested (see results below), or parsing the first characters of the 
>> InputStream until the first '<', both point out that '<' is indeed 
>> the first character encountered.
>>
>> Could it be possible it comes from the encoding of the file?
>> I "iso-8859-1"-ed everything possible though to make every aspect of 
>> the parsing coherent..
>> Where does the prolog start and end? Maybe the problem comes from the 
>> end of the prolog?
>> ...
>>
>> PS: the extract from debug.exe
>>
>> 0D49:0100  3C 3F 78 6D 6C 20 76 65-72 73 69 6F 6E 3D 22 31   <?xml 
>> version="1
>> 0D49:0110  2E 30 22 20 65 6E 63 6F-64 69 6E 67 3D 22 69 73   .0" 
>> encoding="is
>> 0D49:0120  6F 2D 38 38 35 39 2D 31-22 3F 3E 0A 0A 3C 21 44   
>> o-8859-1"?>..<!D
>> 0D49:0130  4F 43 54 59 50 45 20 55-6E 69 74 2D 6F 66 2D 73   OCTYPE 
>> Unit-of-s
>> 0D49:0140  74 75 64 79 0A 20 20 50-55 42 4C 49 43 20 22 2D   tudy.  
>> PUBLIC "-
>> 0D49:0150  2F 2F 4F 55 4E 4C 2F 2F-44 54 44 20 45 4D 4C 2F   
>> //OUNL//DTD EML/
>> 0D49:0160  58 4D 4C 20 62 69 6E 64-69 6E 67 20 31 2E 30 2F   XML 
>> binding 1.0/
>> 0D49:0170  31 2E 30 2F 2F 45 4E 22-20 22 68 74 74 70 3A 2F   1.0//EN" 
>> "http:/
>>
>> Robert Houben wrote:
>>
>>> This may not be your problem, but I've wasted tons of time in the past
>>> because of these symptoms, so here is why it happened to me...
>>>
>>> I have seen this happen when a file is read that contains byte order
>>> marks at the beginning.  Most editors strip these out and get the
>>> encoding right, so you don't know this is happening.  If you are doing
>>> your own file reader to get an InputStream, you may need to skip a few
>>> bytes at the beginning, setting the encoding value correctly based on
>>> them, prior to setting up the reader. To tell if this is happening to
>>> you, on a windows system, use the debug.exe command from the command
>>> line:
>>>
>>> C:\>debug test.xml
>>> -d
>>> 1480:0100  FF FE 3C 00 74 00 65 00-73 00 74 00 3E 00 74 00
>>> ..<.t.e.s.t.>.t.
>>> 1480:0110  65 00 73 00 74 00 3C 00-2F 00 74 00 65 00 73 00
>>> e.s.t.<./.t.e.s.
>>> 1480:0120  74 00 3E 00 0D 00 0A 00-00 00 00 00 00 00 00 00
>>> t.>.............
>>> 1480:0130  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>>> ................
>>> 1480:0140  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>>> ................
>>> 1480:0150  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>>> ................
>>> 1480:0160  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>>> ................
>>> 1480:0170  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>>> ................
>>> -q
>>>
>>> C:\>
>>>
>>> Note that the file starts with "FFFE" which is a Unicode 16 Little
>>> Endian byte order mark (BOM).  If you create your own file reader and
>>> try to pull this in, you will encounter the error that you are
>>> mentioning.  Notepad will show this as normal text, you'll never see 
>>> the
>>> funny stuff.
>>>
>>> HTH,
>>>
>>> -----Original Message-----
>>> From: Andy Clark [mailto:andyc@apache.org] Sent: Wednesday, July 27, 
>>> 2005 5:46 PM
>>> To: j-users@xerces.apache.org
>>> Subject: Re: going crazy with this: org.xml.sax.SAXParseException:
>>> Content is not allowed in prolog
>>>
>>> Paul Ekeland wrote:
>>>  
>>>
>>>> my problem is that I cannot see any whitespace/strange characters
>>>> before the root element of the document. I have used several
>>>> different hexadecimal editors to check that, with no success! Do you
>>>> have a different way to find out of the existence of such things?
>>>>   
>>>
>>>
>>>
>>> Can you attach the first few lines of the file to a
>>> followup message? (Attach, not paste.)
>>>
>>>  
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Mime
View raw message