manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shinichiro Abe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector
Date Tue, 11 Aug 2015 04:12:45 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681211#comment-14681211
] 

Shinichiro Abe commented on CONNECTORS-1219:
--------------------------------------------

Progress report: multiple indexwriters to an index with NoLockFactory lead to corrupt the
index. 
{noformat}
ERROR 2015-08-04 08:17:27,565 (Worker thread '32') - Exception tossed: org.apache.lucene.index.CorruptIndexException:
codec footer mismatch (file truncated?): actual footer=1768776044 vs expected footer=-1071082520
(resource=_br_Lucene50_0.pos)
org.apache.manifoldcf.core.interfaces.ManifoldCFException: org.apache.lucene.index.CorruptIndexException:
codec footer mismatch (file truncated?): actual footer=1768776044 vs expected footer=-1071082520
(resource=_br_Lucene50_0.pos)
{noformat}

In Oak even if there are multiple indexwriters, in fact a single thread writes to an index
in the cluster.
http://markmail.org/thread/2awr5or54vpexzx2

In MCF I think we can have three alternatives.
* use LockManager.enterWriteLock() in multiprocess mode to get global lock and to guarantee
single writer when writing.
 (But it didn't work when I tried. Maybe it was incorrect for me to write the code. Also,
multiple fast indexing is lost by single writer, so I don't want to use that.)  
* use RMI.
 (Because there is no way except for this at this time, this will require much time to implement.)
* This connector doesn't support multiprocess mode unless mcf supports removeDocument per
process.
(Is this violate for mcf's multiprocess specification?)
I'm likely to give up this connector unless any help. I'll postpone this ticket for the time
being.



> Lucene Output Connector
> -----------------------
>
>                 Key: CONNECTORS-1219
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
>
>
> A output connector for Lucene local index directly, not via remote search engine. It
would be nice if we could use Lucene various API to the index directly, even though we could
do the same thing to the Solr or Elasticsearch index. I assume we can do something to classification,
categorization, and tagging, using e.g lucene-classification package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message