jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: [jira] Commented: (JCR-169) Make Jackrabbit clusterable
Date Fri, 01 Sep 2006 07:59:33 GMT
Im replying to the list rather than Jira, since this is OT wrt JCR-169.

So, if you have 50x200MB of Lucene index... for example.... and wanted 
that to be accessible in a cluster environment, would Jackrabbit be a 
good place to put those segments ?

The big killer for Lucene is the ability to seek efficiently on the 
central blob (I think), but presumably by choosing the right Binary 
storage strategy that comes partially for free ?

If this is the case, I could replace my, slightly odd, segment 
distribution mechanism with Jackrabbit.

Last question,
Is JCR-169 being actively worked on ?
Is there an area where another pair of hands would help... I would like 
to be able to deploy Jackrabbit in a cluster.


Marcel Reutegger (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/JCR-169?page=comments#action_12432083 ] 
> Marcel Reutegger commented on JCR-169:
> --------------------------------------
> Ian, thanks a lot for your comments.
> Here are my current thoughts on clustering the search index in jackrabbit:
> I think the prefered approach is to put the index into the repository itself. See: http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/8530
and following messages
> This would also allow us to distribute index updates to cluster nodes using the repository
internal observation mechanism. e.g. the update of a deleted documents file or new index segments.
>> I found the best indexing strategy was to have local copies of segments, stored centrally
as masters.
> I agree. Specifically the design of lucene where index files are only created but never
modified supports this approach very nicely.
>> Im the search application, speed of update of segments is not that critical,
>> you probably have a different requirement in JCR. 
> JCR is more restrictive in that respect, at least if we want to be compliant with the
specification. As soon as a node is created in the workspace it must be searchable using a
query. For most real life systems this is not a hard requirement though. E.g. when a document
is added to a repository, it usually doesn't matter if it is retrievable by query only after
a couple of seconds and not right away.
>> Make Jackrabbit clusterable
>> ---------------------------
>>                 Key: JCR-169
>>                 URL: http://issues.apache.org/jira/browse/JCR-169
>>             Project: Jackrabbit
>>          Issue Type: New Feature
>>          Components: core
>>            Reporter: Marcel Reutegger
>>            Priority: Minor
>> This jira issue discusses the technical implications on the current design of Jackrabbit
to introduce clustering.
>> Particularly the following areas require thorough investigation:
>> - SharedItemStateManager and its cache
>>     - cache integrity
>>     - cache design: look aside, write through?
>>     - hook for distributed cache, interface?
>>     - isolation level
>>     - transaction integrity within Jackrabbit, interaction with transient layer
>> - VirtualItemStateProvider
>>     - same strategy as SharedItemStateManager?
>> - Search index
>>     - single or per cluster node index?
>> - Observation
>> Please state more areas if needed.

View raw message