jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Boston (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-169) Make Jackrabbit clusterable
Date Thu, 31 Aug 2006 23:02:32 GMT
    [ http://issues.apache.org/jira/browse/JCR-169?page=comments#action_12431999 ] 
Ian Boston commented on JCR-169:

Search -
I assume this is the lucene indexes ?
If you havent got to it already....

Im interested in this Jira becuase, I also want to run in a DB cluster. 
I've just finished implementing a search engine based on Lucene in such a cluster, where the
only thing shared is the DB.  Its in production in one or 2 places with ~10G of index segments
on 3+ cluster nodes, the impl is not that great (compared to nutch) but here is what I found
on the way.

Lucene segments in the DB only work in Oracle (and perhapse other DB's), where there is reasonable
Seek performance on blobs. MySQL (for instance) is hopeless at BLOB seeks. Indexes on a shared
filesystem generate lots of network traffic. NDFS (the MapReduce file system) is great but
a complete pain to setup, as is a rsync based strategy for segment distribution. I found the
best indexing strategy was to have local copies of segments, stored centrally as masters.
When a node in the cluster perfoms an index operation, a new master segment is created and
the other nodes sync the master segments. 

Im the search application, speed of update of segments is not that critical, you probably
have a different requirement in JCR.

The only point in this strategy that requires a distributed lock is when segments are merged
(which has to be done to reduce the number of open files) or when documents are deleted from
the lucene index.

As I said the strategy works in production for 50x200Mb segments on 3+ cluster nodes, without
excessive network traffic. If there was an easy NDFS setup that could be coded in Java, that
would probably be a better solution.

The project is www.sakaiproject.org.... where I would also like to use Jackrabbit :)  

> Make Jackrabbit clusterable
> ---------------------------
>                 Key: JCR-169
>                 URL: http://issues.apache.org/jira/browse/JCR-169
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Marcel Reutegger
>            Priority: Minor
> This jira issue discusses the technical implications on the current design of Jackrabbit
to introduce clustering.
> Particularly the following areas require thorough investigation:
> - SharedItemStateManager and its cache
>     - cache integrity
>     - cache design: look aside, write through?
>     - hook for distributed cache, interface?
>     - isolation level
>     - transaction integrity within Jackrabbit, interaction with transient layer
> - VirtualItemStateProvider
>     - same strategy as SharedItemStateManager?
> - Search index
>     - single or per cluster node index?
> - Observation
> Please state more areas if needed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message