geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Darren Foong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-3306) Parsing of cache.xml with whitespace fails with Apache Xerces
Date Fri, 28 Jul 2017 16:39:00 GMT

    [ https://issues.apache.org/jira/browse/GEODE-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105269#comment-16105269
] 

Darren Foong commented on GEODE-3306:
-------------------------------------

After more investigation, I realise that the Apache Xerces parser doesn't call the {{ignorableWhitespace()}}
method, but calls the {{characters()}} method instead. On the other hand, the JDK Xerces calls
the {{ignorableWhitespace()}} method, thus not calling the {{characters()}} method and consequently
not pushing any whitespace StringBuffers on the stack.

Long story short, it seems that the JDK Xerces doesn't strictly conform to the SAX specification
(https://xerces.apache.org/xerces2-j/faq-sax.html#faq-3), because it's calling {{ignorableWhitespace()}}
despite the XML file having no DTD.

Implementing a fix is more complicated than I had thought: one approach is to maintain a parallel
stack of whether the state is "inside" or "outside" of an element, thus allowing us to determine
if any StringBuffers are whitespace ("outside") or content ("inside"). This would require
adding one line to all the {{startX()}} and {{endX()}} methods.

Another workaround would be to use the JDK Xerces for Geode by setting a system property ({{javax.xml.parsers.SAXParserFactory}})
and then setting it back to the Apache Xerces implementation for non-Geode work. For those
who are facing this issue because they are using some other JDK, they can download the Oracle
JDK internal Xerces from Maven Central (http://central.maven.org/maven2/com/sun/xml/parsers/jaxp-ri/).

Lastly (for completeness), there's also the workaround mentioned before: simply remove all
ignorable whitespace manually in the XML file, so the underlying parser never has to deal
with ignorable whitespace.

> Parsing of cache.xml with whitespace fails with Apache Xerces
> -------------------------------------------------------------
>
>                 Key: GEODE-3306
>                 URL: https://issues.apache.org/jira/browse/GEODE-3306
>             Project: Geode
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.2.0
>            Reporter: Darren Foong
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> I am using Geode 1.2.0 and Apache Xerces 2.11.0 (not the one included in the Oracle JDK),
and I encountered the following error when I tried to programmatically start a cache:
> {noformat}
> org.apache.geode.InternalGemFireError: Did not expected a java.lang.StringBuffer on top
of the stack.
> Exception in thread "main" org.apache.geode.InternalGemFireError: Did not expected a
java.lang.StringBuffer on top of the stack.
> 	at org.apache.geode.internal.Assert.throwError(Assert.java:94)
> 	at org.apache.geode.internal.Assert.assertTrue(Assert.java:117)
> 	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.endRegionAttributes(CacheXmlParser.java:1257)
> 	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.endElement(CacheXmlParser.java:2909)
> 	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser$DefaultHandlerDelegate.endElement(CacheXmlParser.java:3374)
> 	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
> 	at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
> 	at org.apache.xerces.impl.xs.XMLSchemaValidator.emptyElement(Unknown Source)
> 	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
> 	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
> 	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
> 	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> 	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> 	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> 	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> 	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
> 	at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
> 	at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
> 	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.parse(CacheXmlParser.java:224)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4287)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1390)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1195)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:758)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:745)
> 	at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:173)
> 	at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:212)
> 	at server.ServerWhitespace.main(ServerWhitespace.java:8)
> {noformat}
> However, this does not happen when I don't use Apache Xerces, i.e. I rely on the version
in the Oracle JDK (1.8).
> After getting the Geode source code and stepping through the parsing using the Eclipse
debugger, I realised that there were unexpected StringBuffers pushed onto the parse stack,
thus causing the problem.
> These StringBuffers were created and pushed by the {{characters()}} method (https://github.com/apache/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/xmlcache/CacheXmlParser.java#L3270).
Changing the log level to {{TRACE}} and examining the parse stack showed that these StringBuffers
contained the whitespace (including newlines) between the XML tags in {{cache.xml}}.
> When using the Oracle JDK's version of Xerces, these StringBuffers did not appear on
the parse stack despite the whitespace.
> I have a proof of concept on GitHub: https://github.com/darrenfoong/geode-parser-poc
The {{cache.xml}} file without whitespace between the tags was parsed without errors by both
versions of Xerces.
> It could be the case that the JDK Xerces strips out whitespace while Apache Xerces doesn't;
but this could be implemented in {{characters()}} by only pushing non-whitespace char arrays
in the {{else}} block. However, there could be other XML parsing edge cases that I am unaware
of.
> There should be others who need Apache Xerces for their projects; a fix would be appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message