Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BE69F18300 for ; Wed, 15 Jul 2015 12:10:10 +0000 (UTC) Received: (qmail 43598 invoked by uid 500); 15 Jul 2015 12:10:05 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 43555 invoked by uid 500); 15 Jul 2015 12:10:05 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 43540 invoked by uid 99); 15 Jul 2015 12:10:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2015 12:10:05 +0000 Date: Wed, 15 Jul 2015 12:10:05 +0000 (UTC) From: "Karl Wright (JIRA)" To: dev@manifoldcf.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627952#comment-14627952 ] Karl Wright commented on CONNECTORS-1219: ----------------------------------------- Hi Abe-san, No, it is not necessary to serialize indexwriter. I think you may misunderstand the proposal. So to make it clear: (1) ALL lucene activity would happen in one sidecar process, including the Lucene searcher and a separate Jetty instance it would run under (2) ManifoldCF would have multiple processes (3) Communication between the ManifoldCF processes and the Lucene process would be via a socket (4) The socket protocol would either be Java-serialization-based RMI (which I would need to research), or some other low-level protocol. The goal would be to NOT use REST or XML or JSON or any other heavyweight, open protocol. (5) The reason an open protocol is undesirable is because we definitely don't want to reinvent ElasticSearch, Solr, or any other Lucene wrapper. The reason, though, to have a separate process is because Lucene's memory and disk model is inconsistent with ManifoldCF's. Does this make sense? > Lucene Output Connector > ----------------------- > > Key: CONNECTORS-1219 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1219 > Project: ManifoldCF > Issue Type: New Feature > Reporter: Shinichiro Abe > Assignee: Shinichiro Abe > Attachments: CONNECTORS-1219-v0.1patch.patch, CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch > > > A output connector for Lucene local index directly, not via remote search engine. It would be nice if we could use Lucene various API to the index directly, even though we could do the same thing to the Solr or Elasticsearch index. I assume we can do something to classification, categorization, and tagging, using e.g lucene-classification package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)