manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector
Date Wed, 15 Jul 2015 12:10:05 GMT


Karl Wright commented on CONNECTORS-1219:

Hi Abe-san,

No, it is not necessary to serialize indexwriter.  I think you may misunderstand the proposal.
 So to make it clear:

(1) ALL lucene activity would happen in one sidecar process, including the Lucene searcher
and a separate Jetty instance it would run under
(2) ManifoldCF would have multiple processes
(3) Communication between the ManifoldCF processes and the Lucene process would be via a socket
(4) The socket protocol would either be Java-serialization-based RMI (which I would need to
research), or some other low-level protocol.  The goal would be to NOT use REST or XML or
JSON or any other heavyweight, open protocol.
(5) The reason an open protocol is undesirable is because we definitely don't want to reinvent
ElasticSearch, Solr, or any other Lucene wrapper.  The reason, though, to have a separate
process is because Lucene's memory and disk model is inconsistent with ManifoldCF's.

Does this make sense?

> Lucene Output Connector
> -----------------------
>                 Key: CONNECTORS-1219
>                 URL:
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
> A output connector for Lucene local index directly, not via remote search engine. It
would be nice if we could use Lucene various API to the index directly, even though we could
do the same thing to the Solr or Elasticsearch index. I assume we can do something to classification,
categorization, and tagging, using e.g lucene-classification package.

This message was sent by Atlassian JIRA

View raw message