uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Holmberg <holmberg2...@comcast.net>
Subject Re: ClassLoader problems when using PEAR files
Date Thu, 27 Sep 2012 20:07:46 GMT
This Tika issues confirms the problem. 

https://issues.apache.org/jira/browse/TIKA-412 

Jukka Zitting reports:

POI depends on dom4j that in turn depends on the xml-apis jar for some XML-related interfaces
that are nowadays a part of the JRE. Normally having such an extra jar around doesn't harm
anything as normal class loaders will always use the classes provided by the JRE. However
some application servers like JBoss allow applications to override javax.* interfaces, which
causes all sorts of trouble. Thus it's better if we exclude the xml-apis dependency from Tika.

This issue was fixed in Tika 0.8.

So upgrading the version of Tika in TikaAnnotator would definitely solve the problem for UIMA
PEAR users.

Greg


----- Original Message ----- 
From: "Greg Holmberg" <holmberg2066@comcast.net> 
To: msa@schor.com 
Cc: user@uima.apache.org 
Sent: Wednesday, September 26, 2012 6:11:55 PM 
Subject: Re: ClassLoader problems when using PEAR files 

Hi Marshall-- 


I did try that. What it told me is that DocumentBuilderFactory.newInstance() was able to find
an implementation many times right up to the point that Tika tried within the PEAR analysis
engine, when it couldn't find an implementation. Which I already knew :-) 

Before that point, it was able to find several different implementations, but mostly com.sun.org.apache.xerces.internal.jaxp.documentbuilderfactoryimpl
(the platform fallback). Since this class exists in rt.jar (i.e. it's built into the JDK installation),
I was perplexed about how the classloader could fail to find it. Especially when I even called
ResourceManager.setExtensionClassPath(Thread.currentThread().getContextClassLoader(), ...).
That should have allowed the UIMA class loader to fallback to the system class loader, which
should be able to find classes in rt.jar. But it didn't. 

After extensive experimenting and googling (I hate to admit how many days I spent on this),
I finally figured it out. The conditions are that one is using: 

* Java 1.6 or later (including 1.7) 
* UIMA Addons 2.3.1, specifically the TikaAnnotator and Tika 0.7. 
* PEAR Installer. 

As you know, when you use PEAR files (PackageInstaller), then UIMAFramework.produceAnalysisEngine()
creates a new class loader in order to provide an insulated environment based on the classpath
instructions in the PEAR's metadata/install.xml file. 

In my case, the PEAR file was built by maven, which I configured (using the "assembly" plug-in)
to unpack the .class files of all the dependencies into the "lib" dir. I wanted to create
an "all in one" PEAR file with all the necessary classes, so I configured useTransitiveDependencies
to true. (By the way, you have to exclude org.apache.uima:uimaj-*:jar from the assembly.)


Here's where it goes wrong. Maven smartly follows all the dependencies: TikaAnnotator 2.3.1
-> tika-parsers 0.7 -> poi-ooxml 3.6 -> dom4j 1.6.1 -> xml-apis. The problem is
that xml-apis includes an implementation of the javax.xml package (I think, or some part of
it, anyway). Apparently, dom4j pre-dates JDK 1.6, because since JDK 1.6 the javax.xml package
is built into the JDK, and one doesn't need xml-apis. So what happens, I think, is some implementation
of DocumentBuilderFactory is found in xml-apis, and it is somehow incompatible with the interface,
and can't be instantiated. So DocumentBuilderFactory gives up, and doesn't even try the one
in rt.jar (even though the classloader could find it, if asked). 

In short, due to xml-apis being in the PEAR file, the system can't find the good DocumentBuilderFactory
in rt.jar. 

Solution: remove xml-apis from the PEAR file. 

I did it by changing my pom.xml: 

<dependency> 
<groupId>org.apache.uima</groupId> 
<artifactId>TikaAnnotator</artifactId> 
<exclusions> 
<exclusion> 
<groupId>xml-apis</groupId> 
<artifactId>xml-apis</artifactId> 
</exclusion> 
</exclusions> 
</dependency> 

=========== 

May I suggest that UIMA Add-ons upgrades to a newer version of Tika? 0.7 dates to April 2010.
Current version is 1.2. I'm guessing that a more current version using a more current POI
and DOM4J wouldn't have the dependency on xml-apis (since that package is now included in
the JDK). I think that would be the best solution to allow using TikaAnnotator in PEAR files
in Java 1.6 and later. 


Hope this helps someone. Can I be the only one using TikaAnnotator in PEAR files on Java 1.6?



Greg Holmberg 


----- Original Message ----- 
From: "Marshall Schor" <msa@schor.com> 
To: user@uima.apache.org 
Sent: Wednesday, September 26, 2012 3:57:07 PM 
Subject: Re: ClassLoader problems when using PEAR files 

Hi Greg, 

Did you try troubleshooting this using the "Tip" in the Javadocs for the 
DocumentBuilderFactory class (add -Djaxp.debug=1 to the "java" command line)? 

-Marshall 

On 9/24/2012 6:46 PM, Greg Holmberg wrote: 
> Hi UIMA users-- 
> 
> 
> When I use PEAR files, the XML parser can't find it's DocumentBuilderFactory. I think
it's a ClassLoader issue. Has anyone else seen this? 
> 
> I install the PEAR as described in the docs: 
> 
> PackageBrowser pkg = PackageInstaller.installPackage(myDir, pearFile, false); 
> 
> String pearDescPath = pkg.getComponentPearDescPath(); 
> 
> ResourceSpecifier specifier = 
> UIMAFramework.getXMLParser().parseResourceSpecifier( 
> new XMLInputSource(pearDescPath)); 
> 
> ResourceManager resmgr = getResourceManager(); 
> 
> AnalysisEngine engine = UIMAFramework.produceAnalysisEngine(specifier, resmgr, params);

> 
> My PEAR includes TikaAnnotator, and I get the exception shown at the end of this email.
Summary: TikaConfig asks for an XML parser, but the system can't find one. 
> 
> Outside the analysis engine, it's possible to find an implementation of DocumentBuilderFactory,
but inside it seems that the ClassLoader in use doesn't have one. 
> 
> javax.xml.parsers.DocumentBuilderFactory.newInstance() has a complicated way of finding
the implementation (quoting the JavaDoc): 
> 
> ======================= 
> 
> Obtain a new instance of a DocumentBuilderFactory. This static method creates a new factory
instance. This method 
> uses the following ordered lookup procedure to determine the DocumentBuilderFactory implementation
class to load: 
> 
> * Use the javax.xml.parsers.DocumentBuilderFactory system property. 
> * Use the properties file "lib/jaxp.properties" in the JRE directory. This configuration
file is in standard java.util.Properties format and contains the fully qualified name of the
implementation class with the key being the system property defined above. The jaxp.properties
file is read only once by the JAXP implementation and it's values are then cached for future
use. If the file does not exist when the first attempt is made to read from it, no further
attempts are made to check for its existence. It is not possible to change the value of any
property in jaxp.properties after it has been read for the first time. 
> * Use the Services API (as detailed in the JAR specification), if available, to determine
the classname. The Services API will look for a classname in the file META-INF/services/javax.xml.parsers.DocumentBuilderFactory
in jars available to the runtime. 
> * Platform default DocumentBuilderFactory instance. 
> 
> ========================= 
> 
> So it seems like the ClassLoader used in the analysis engine prevents DocumentBuilderFactory
from finding even the platform default implementation. 
> 
> Does anyone know how to work around this? Add something to my metadata/install.xml file
perhaps? 
> 
> Thanks, 
> 
> 
> Greg Holmberg 
> 
> 
> 
> org.apache.uima.resource.ResourceInitializationException: Error initializing "org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper"
from descriptor file:/tmp/taservice/pear/SAPAnalysisEngine/SAPAnalysisEngine_pear.xml. 
> at org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:144)

> at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)

> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314) 
> at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425) 
> at com.sap.taservice.controller.UimaPipeline.createAnalysisEngine(UimaPipeline.java:343)

> at com.sap.taservice.controller.UimaPipeline.execute(UimaPipeline.java:151) 
> at com.sap.taservice.controller.TAServiceWork.execute(TAServiceWork.java:44) 
> at com.sap.job.impl.TaskImpl.execute(TaskImpl.java:104) 
> at com.sap.taservice.job.impl.remote.RemoteWorker.iteration(RemoteWorker.java:52) 
> at com.sap.util.DaemonRunnable.run(DaemonRunnable.java:117) 
> at java.lang.Thread.run(Thread.java:662) 
> Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for javax.xml.parsers.DocumentBuilderFactory
cannot be found 
> at javax.xml.parsers.DocumentBuilderFactory.newInstance(Unknown Source) 
> at org.apache.tika.config.TikaConfig.getBuilder(TikaConfig.java:228) 
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:66) 
> at org.apache.uima.tika.MarkupAnnotator.initialize(MarkupAnnotator.java:96) 
> at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)

> at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:158)

> at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)

> at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)

> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) 
> at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:255) 
> at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)

> at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)

> at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)

> at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)

> at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)

> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at org.apache.uima.internal.util.ResourcePool.fillPool(ResourcePool.java:243) 
> at org.apache.uima.internal.util.ResourcePool.<init>(ResourcePool.java:100) 
> at org.apache.uima.internal.util.AnalysisEnginePool.<init>(AnalysisEnginePool.java:91)

> at org.apache.uima.analysis_engine.impl.MultiprocessingAnalysisEngine_impl.initialize(MultiprocessingAnalysisEngine_impl.java:118)

> at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)

> at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)

> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) 
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314) 
> at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425) 
> at org.apache.uima.analysis_engine.impl.PearAnalysisEngineWrapper.initialize(PearAnalysisEngineWrapper.java:269)

> at org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:123)

> ... 11 more 
> 
> 


Mime
View raw message