Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@manifoldcf.apache.org
Date: Wed, 15 Jul 2015 12:10:05 +0000 (UTC)
From: "Karl Wright (JIRA)" <jira@apache.org>
To: dev@manifoldcf.apache.org
Message-ID: <JIRA.12841547.1435636068000.194161.1436962205588@Atlassian.JIRA>
In-Reply-To: <JIRA.12841547.1435636068000@Atlassian.JIRA>
References: <JIRA.12841547.1435636068000@Atlassian.JIRA>
 <JIRA.12841547.1435636068525@arcas>
Subject: [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627952#comment-14627952 ] 

Karl Wright commented on CONNECTORS-1219:
-----------------------------------------

Hi Abe-san,

No, it is not necessary to serialize indexwriter.  I think you may misunderstand the proposal.  So to make it clear:

(1) ALL lucene activity would happen in one sidecar process, including the Lucene searcher and a separate Jetty instance it would run under
(2) ManifoldCF would have multiple processes
(3) Communication between the ManifoldCF processes and the Lucene process would be via a socket
(4) The socket protocol would either be Java-serialization-based RMI (which I would need to research), or some other low-level protocol.  The goal would be to NOT use REST or XML or JSON or any other heavyweight, open protocol.
(5) The reason an open protocol is undesirable is because we definitely don't want to reinvent ElasticSearch, Solr, or any other Lucene wrapper.  The reason, though, to have a separate process is because Lucene's memory and disk model is inconsistent with ManifoldCF's.

Does this make sense?


> Lucene Output Connector
> -----------------------
>
>                 Key: CONNECTORS-1219
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
>
>
> A output connector for Lucene local index directly, not via remote search engine. It would be nice if we could use Lucene various API to the index directly, even though we could do the same thing to the Solr or Elasticsearch index. I assume we can do something to classification, categorization, and tagging, using e.g lucene-classification package.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)