harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mariano Kamp <mariano.k...@gmail.com>
Subject SAXParseException: Illegal: ]]>
Date Wed, 22 Apr 2009 11:52:32 GMT

  I am not sure if this is the right list, but I thought I start out
where the stack trace points me ;-)

  I get a SAXParseExeption when parsing an atom feed from Google Reader:

org.xml.sax.SAXParseException: Illegal: ]]> (position:START_TAG
<category term='user/xyz/state/com.google/fresh'>@5:15061 in
 	at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:151)
 	at com.newsrob.U.parseXMLfromInputStream(U.java:45)
 	at com.newsrob.EntriesRetriever.fetchNewEntries(EntriesRetriever.java:299)
 	at com.newsrob.SynchronizationService$4.run(SynchronizationService.java:172)
 	at com.newsrob.SynchronizationService.doSync(SynchronizationService.java:337)
 	at com.newsrob.SynchronizationService.access$0(SynchronizationService.java:86)
 	at com.newsrob.SynchronizationService$1.run(SynchronizationService.java:75)
 	at java.lang.Thread.run(Thread.java:935)

  I think the problem originates here (see the last category tag):

<category term="user/xyz/state/com.google/reading-list"
scheme="http://www.google.com/reader/" label="reading-list"/><category
scheme="http://www.google.com/reader/" label="fresh"/><category
term="&lt;![CDATA[ Agenda ]]&gt;"/>

  Any idea why this happens?

  This is the (abbreviated) code I use to parse the stream from Google.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setCoalescing(true); // added this later with no effect
DocumentBuilder builder = dbf.newDocumentBuilder();

BufferedReader br = new BufferedReader(new
InputStreamReader(is,"UTF-8"), 8 * 1024);
builder.parse(new InputSource(br));


  Maybe Google doesn't generate proper XML? I don't know. I originally
converted my code back to use DOM like above, because I got the same
problem with kXML, but they state in their documentation that it
doesn't support this escaping, I think:

  n order to keep kXML as small as possible, no efforts are made to recognize
certain well-formedness errors that would require additional detection code,
such as
   - ']]>' contained in text content,
   - duplicate attributes, and
   - <? folowed by a space before the targe


View raw message