manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1428) Allow tika config parameter
Date Thu, 01 Jun 2017 18:53:04 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033487#comment-16033487
] 

Karl Wright commented on CONNECTORS-1428:
-----------------------------------------

Hi [~julienFL], can you explain what this is trying to do?

{code}
 public class TikaParser {
 
-  private static Parser parser = new AutoDetectParser();
+  private static Parser parser = null;
+  private static String currentConfig = null;
 
   private TikaParser() { }
+  
+  public static synchronized void initParser(final String tikaConfig) {
+    if (!tikaConfig.equals(currentConfig)) {
+      InputStream is = new ByteArrayInputStream(tikaConfig.getBytes());
+      try {
+        TikaConfig conf = new TikaConfig(is);
+        parser = new AutoDetectParser(conf);
+        currentConfig = tikaConfig;
+      } catch (TikaException | IOException | SAXException e) {
+        parser = new AutoDetectParser();
+      }
+      
+      Map<MediaType, Parser> parsers = ((AutoDetectParser) parser).getParsers();
+      parsers.put(MediaType.APPLICATION_XML, new HtmlParser());
+      ((AutoDetectParser) parser).setParsers(parsers);
+    }
+  }
{code}

It looks like the TikaParser class needs to be rearranged to allow configurability.  I'll
take care of that if that's the intent.





> Allow tika config parameter
> ---------------------------
>
>                 Key: CONNECTORS-1428
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1428
>             Project: ManifoldCF
>          Issue Type: Wish
>          Components: Tika extractor
>    Affects Versions: ManifoldCF 2.7
>            Reporter: Julien Massiera
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 2.8
>
>         Attachments: CONNECTORS-1428.patch, CONNECTORS-1428v2.patch
>
>
> It would be nice to have an option to pass a tika config file to the connector through
the UI.
> The connector would load it in the "TikaParser" class like : 
> private static Parser parser = new AutoDetectParser(new TikaConfig(new File("path/to/file")));
> This is just an example of course, it has to be done properly



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message