manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shinichiro Abe (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector
Date Thu, 16 Jul 2015 02:43:04 GMT


Shinichiro Abe commented on CONNECTORS-1219:

Yes, it does for separate process and RMI. But there still has a serialization problem.
I'm not sure about RMI, read mcf in action yesterday though, but when mcf'connection invokes
the method which will add or replace a document via RMI, the class having that method have
to be implemented serializable. This class may have LuceneClient which has a indexwriter.
Is this correct? If so, maybe it will not work. If correct, it works well if the method is
implemented by not having LuceneClient in that class, and the method just puts to something
object like queue, then LuceneClient picks up from the queue. But this case is not enough
for me in indexing latency-wise.
A few month ago I was looking for lowerest indexing latency implementation as pull crawler
model. At that time, I used apache spark, ignite working on distributed nodes, which require
to implement serializable class. I used lucene indexes with local disk version or hdfs version,
but all I did ended up with a failure because of indexwriter serialization. After that I thought
mcf could become the the best lowest indexing latency application when we set up mcf single
processes to each node. The each node has each index. But this thought does not meet mcf multi
process model though.

> Lucene Output Connector
> -----------------------
>                 Key: CONNECTORS-1219
>                 URL:
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
> A output connector for Lucene local index directly, not via remote search engine. It
would be nice if we could use Lucene various API to the index directly, even though we could
do the same thing to the Solr or Elasticsearch index. I assume we can do something to classification,
categorization, and tagging, using e.g lucene-classification package.

This message was sent by Atlassian JIRA

View raw message