jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Ducar (JIRA)" <j...@apache.org>
Subject [jira] Updated: (JCR-2642) JackrabbitParser and tika 0.7 parser
Date Mon, 07 Jun 2010 08:14:07 GMT

     [ https://issues.apache.org/jira/browse/JCR-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Dan Ducar updated JCR-2642:

    Issue Type: Improvement  (was: New Feature)

> JackrabbitParser and tika 0.7 parser
> ------------------------------------
>                 Key: JCR-2642
>                 URL: https://issues.apache.org/jira/browse/JCR-2642
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>    Affects Versions: 2.1.0
>            Reporter: Dan Ducar
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
> Hi,
> I was trying to implement a custom parser and found the following problem.
> Since tika 0.7 it is possible to implement your custom parser and specify it into a service
provider configuration file (META-INF/services/org.apache.tika.parser.Parser). In this way
there would be no need to maintain a custom tika-config.xml file if you'd like to implement
a custom parser.
> The problem that I had was in the JackrabbitParser because I wasn't able to instantiate
the AutoDetectParser with the default constructor is will be instantiated using the default
TikaConfig constructor.
> Basically from tika 0.7, the TikaConfig.getTikaConfig() is instantiating the TikaConfig
using the default constructor instead of accessing the tika-config.xml file from withing the
package, and reads the service provider configuration files and populate the parsers map.
> What I'm proposing is to change the JackrabbitParser to instantiate the AutoDetectParser
using the default constructor, in this way the using tika version >= 0.7 we could easily
implement our own parsers and there won't be a reason to maintain the tika-config.xml, also
a sort of "backward" compatibility would be maintained because using the AutoDetectParser
default constructor the TikaConfig is instantiated using TikaConfig.getTikaConfig() wich for
tika versions < 0.7 calls the TikaConfig(InputStream) constructor whcih reads the configuration
directly from the package.
> Basically the JackrabbitParser should look like this:
>     public JackrabbitParser() {
>             	parser = new AutoDetectParser();
>     }
> Thanks,
> Dan

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message