jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Bernet <paul.ber...@crealogix.com>
Subject Problem with Indexing XML Docs with Tika in Jackrabbit 2.6.2
Date Wed, 17 Jul 2013 16:40:48 GMT
Hi,

I am migrating a Jackrabbit Instance from 2.2.13 to 2.6.2 using:
jackrabbit-core
jackrabbit-jcr-commons
jackrabbit-jcr-rmi
For indexing I am using the module tika-core and parts of tika-parsers.
Because the module tika-parsers is creating problems (among others the aspectjrt-1.6.x.jar
is in conflict with my one-jar pkg meccano) I try to include only those parser classes and
their dependencies into the Project, so I am able to index .pdf and .xml files. While the
indexing via the PDFParser is working the DcXMLParser parser is not executed and no content
is in the index.
When I configure the EmptyParser with the application/xml Mime-Type EmptyParser is not called
either.

So what confuses me is that the PDFParser config is read from the tika-config.xml (I can proof
that with falsifying the Classname) and called at runtime.
However, the XMLParser is read as well but not called at runtime.

tika-config.xml
...
<mimeTypeRepository resource="/org/apache/tika/mime/tika-mimetypes.xml" magic="false"/>
<parsers>
<parser name="parse-pdf" class="org.apache.tika.parser.pdf.PDFParser">
   <mime>application/pdf</mime>
</parser>

<parser name="parse-dcxml" class="org.apache.tika.parser.xml.DcXMLParser">
  <mime>application/xml</mime>
  <mime>image/svg+xml</mime>
</parser>

<parser class="org.apache.tika.parser.DefaultParser"/>
<parser class=" org.apache.tika.parser.EmptyParser ">
  <!--  <mime>application/xml</mime> -->
</parser>
</parsers>
....

The XML-Files have the Mime-Type application/xml.
The other configuration file /resources/META-INF/services/org.apache.tika.parser.Parser is
in a sub-jar of the one-jar pkg. Because that did not show effect I took it outside and referenced
it explicitly on the classpath on startup but that did not show any effect either. Is this
file needed for the Parsers to work?

Thanks for any hints!
Paul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message