manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Massiera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1428) Allow tika config parameter
Date Thu, 01 Jun 2017 14:46:04 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033092#comment-16033092
] 

Julien Massiera commented on CONNECTORS-1428:
---------------------------------------------

A certain configuration for parsers may be very specific for the repository/folder crawled.
Specially if you want to replace for example the standard DcXML parser by your own one to
have a very different metadata/content extraction behaviour concerning the XML files. 
Having the configuration at the connector level implies to create different Tika connectors.
Maybe it is still best to let the configuration on this side, what do you recommend ? 

> Allow tika config parameter
> ---------------------------
>
>                 Key: CONNECTORS-1428
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1428
>             Project: ManifoldCF
>          Issue Type: Wish
>          Components: Tika extractor
>    Affects Versions: ManifoldCF 2.7
>            Reporter: Julien Massiera
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 2.8
>
>         Attachments: CONNECTORS-1428.patch
>
>
> It would be nice to have an option to pass a tika config file to the connector through
the UI.
> The connector would load it in the "TikaParser" class like : 
> private static Parser parser = new AutoDetectParser(new TikaConfig(new File("path/to/file")));
> This is just an example of course, it has to be done properly



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message