jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Bégnis <max...@neodoc.biz>
Subject Re: 1.6.2 : JCR-2645 XML text extraction in Jackrabbit 1.x accesses external resources
Date Mon, 19 Jul 2010 11:35:44 GMT
Hi all,

Concerning this problem we patched the source of 
jackrabbit-text-extractors/src/main/java/org/apache/jackrabbit/extractor/XMLTextExtractor.java

of version 1.6.2 by adding the following after line 88:

reader.setEntityResolver(handler);

This has worked very well for us so far. Could someone tell us if there 
is a risk of breaking something else somewhere else with this?

Attached to this mail the patched XMLTextExtractor.java we are using.

Thanks.

Maxime Bégnis

Le 08/07/2010 17:34, Maxime Bégnis a écrit :
> Hi all,
>
> We upgraded our application to use the 1.6.2 version because we had
> problems indexing XML files referencing an external DTD in their
> DOCTYPEs (more specifically DITA files). The issue is stated to be fixed
> in this version but we still have the same problem, the following
> warning is printed several times to the log :
>
> WARN org.apache.jackrabbit.core.query.lucene.TextExtractorJob:132 -
> Exception while indexing binary property: java.io.FileNotFoundException:
> http://docs.oasis-open.org/dita/dtd/reference.dtd
>
> I'm wondering if we are doing something wrong somewhere, thanks if you
> can help.
>
> Bug reference : JCR-2645
>
> These are the JackRabbit and JackRabbit-related jars we're using. I've
> put an asterisk on those changed (for newer versions) from the
> JackRabbit 1.6.2 WAR distribution
>
> commons-codec-1.4.jar (*)
> commons-collections-3.2.jar (*)
> commons-fileupload-1.2.1.jar
> commons-io-1.4.jar
> concurrent-1.3.4.jar
> derby-10.4.jar (*)
> fontbox-0.1.0.jar
> jackrabbit-api-1.6.2.jar
> jackrabbit-core-1.6.2.jar
> jackrabbit-jcr-commons-1.6.2.jar
> jackrabbit-spi-1.6.2.jar
> jackrabbit-spi-commons-1.6.2.jar
> jackrabbit-text-extractors-1.6.2.jar
> jcr-1.0.jar
> jempbox-0.2.0.jar
> log4j-1.2.15.jar (*)
> lucene-core-2.4.1.jar
> nekohtml-1.9.7.jar
> pdfbox-0.7.3.jar
> poi-3.2-FINAL.jar
> poi-scratchpad-3.2-FINAL.jar
> slf4j-api-1.5.8.jar (*)
> slf4j-log4j12-1.5.8.jar (*)
> xercesImpl-2.9.1.jar (*)
> xml-apis-ext-1.3.04.jar (*)
>
> Maxime Bégnis
>    

Mime
View raw message