manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shinichiro Abe (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1234) TikaExtractor based indexing on Elasticsearch connector
Date Thu, 27 Aug 2015 22:58:45 GMT


Shinichiro Abe commented on CONNECTORS-1234:

bq. they are streamed
Does this means home-grown ES httpclient? If yes, this is not constructing a document in memory
which differs from SolrJ client. I'll post very large files later, I don't have a large file
at the moment. Thanks.

> TikaExtractor based indexing on Elasticsearch connector
> -------------------------------------------------------
>                 Key: CONNECTORS-1234
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Improvement
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1234.patch
> We could add the use-mapper-attachments flag.
> Default to true, current spec which asks for mapper-attachments plugin on ES side.
> If false, it allows us to index the content and metadata that extracted from files through
Tika transformer, which means there is no need to install that plugin and put base64 encoded

This message was sent by Atlassian JIRA

View raw message