Hi all,

Concerning this problem we patched the source of jackrabbit-text-extractors/src/main/java/org/apache/jackrabbit/extractor/XMLTextExtractor.java of version 1.6.2 by adding the following after line 88:

reader.setEntityResolver(handler);

This has worked very well for us so far. Could someone tell us if there is a risk of breaking something else somewhere else with this?

Attached to this mail the patched XMLTextExtractor.java we are using.

Thanks.

Maxime Bégnis

Le 08/07/2010 17:34, Maxime Bégnis a écrit :
Hi all,

We upgraded our application to use the 1.6.2 version because we had
problems indexing XML files referencing an external DTD in their
DOCTYPEs (more specifically DITA files). The issue is stated to be fixed
in this version but we still have the same problem, the following
warning is printed several times to the log :

WARN org.apache.jackrabbit.core.query.lucene.TextExtractorJob:132 -
Exception while indexing binary property: java.io.FileNotFoundException:
http://docs.oasis-open.org/dita/dtd/reference.dtd

I'm wondering if we are doing something wrong somewhere, thanks if you
can help.

Bug reference : JCR-2645

These are the JackRabbit and JackRabbit-related jars we're using. I've
put an asterisk on those changed (for newer versions) from the
JackRabbit 1.6.2 WAR distribution

commons-codec-1.4.jar (*)
commons-collections-3.2.jar (*)
commons-fileupload-1.2.1.jar
commons-io-1.4.jar
concurrent-1.3.4.jar
derby-10.4.jar (*)
fontbox-0.1.0.jar
jackrabbit-api-1.6.2.jar
jackrabbit-core-1.6.2.jar
jackrabbit-jcr-commons-1.6.2.jar
jackrabbit-spi-1.6.2.jar
jackrabbit-spi-commons-1.6.2.jar
jackrabbit-text-extractors-1.6.2.jar
jcr-1.0.jar
jempbox-0.2.0.jar
log4j-1.2.15.jar (*)
lucene-core-2.4.1.jar
nekohtml-1.9.7.jar
pdfbox-0.7.3.jar
poi-3.2-FINAL.jar
poi-scratchpad-3.2-FINAL.jar
slf4j-api-1.5.8.jar (*)
slf4j-log4j12-1.5.8.jar (*)
xercesImpl-2.9.1.jar (*)
xml-apis-ext-1.3.04.jar (*)

Maxime Bégnis