manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shinichiro Abe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector
Date Fri, 17 Jul 2015 05:18:04 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630787#comment-14630787
] 

Shinichiro Abe commented on CONNECTORS-1219:
--------------------------------------------

Thanks [~apillaiz], I'd like to collect not only web content but also manifold repositories
content.

 [~DaddyWri], I discovered the [OakDirectory|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L89]
which extends Lucene Directory class. I saw the below comment, they also had multi process(cluster)
problem as to Lucene index, and they put the index to Blob object that means mongodb or rdb
storage. From that, I come to switching Directory impl, for instance, we use FSDirectory on
mcf single process, and use [HdfsDirectory|http://lucene.apache.org/solr/5_2_1/solr-core/org/apache/solr/store/hdfs/HdfsDirectory.html]
on mcf multi process. The writes to Hdfs was [slow|https://github.com/ouava/lclient/blob/master/lclient-hdfs/src/main/java/org/apache/lucene/lclient/util/HdfsUtils.java#L47]
when I tried to use before. But this will be expected to improve.
I don't want to use RMI because... first: to avoid complexable operation or increase 2 steps
for bootstrap on single process mode,  second: I don't know how to write the test code, third:
around me, only one user uses multi process and everyone will hope to run mcf as OOTB as possible,
 fourth: jackrabbit 2 has RMI api but oak doesn't have one. I think RMI is not cool as well
as CMIS rather than JCR , fifth: I want to make mcf easy to use. These are not technical reason,
but HdfsDirectory will help us.


> Lucene Output Connector
> -----------------------
>
>                 Key: CONNECTORS-1219
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
>
>
> A output connector for Lucene local index directly, not via remote search engine. It
would be nice if we could use Lucene various API to the index directly, even though we could
do the same thing to the Solr or Elasticsearch index. I assume we can do something to classification,
categorization, and tagging, using e.g lucene-classification package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message