manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector
Date Fri, 10 Jul 2015 13:41:05 GMT


Michael McCandless commented on CONNECTORS-1219:

We could possibly patch Lucene to allow stored=true for Reader as well ... this is probably
quite tricky, e.g. the codec APIs (StoredFieldsFormat) would need to accept Reader too.

Even if we did that, though, a very large document can still be problematic.  You should test
using Reader just for indexing: it could also be even this still puts too much heap pressure
because IndexWriter must store all tokens for that one document in heap before it can write
a new segment.

> Lucene Output Connector
> -----------------------
>                 Key: CONNECTORS-1219
>                 URL:
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, CONNECTORS-1219-v0.2.patch
> A output connector for Lucene local index directly, not via remote search engine. It
would be nice if we could use Lucene various API to the index directly, even though we could
do the same thing to the Solr or Elasticsearch index. I assume we can do something to classification,
categorization, and tagging, using e.g lucene-classification package.

This message was sent by Atlassian JIRA

View raw message