jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@onehippo.com>
Subject RE: Jackrabbit overlapping w/ Lucene-Solr
Date Tue, 06 May 2008 08:22:43 GMT
Hello,

> > ... Isn't there a bit of an overlap?...
> 
> Yes, when it comes to indexing and searching.
> 
> Jackrabbit and Solr both base their indexing stuff on Lucene, 
> and add some features on top of it.
> 
> That set of additional features could probably be (at least 
> partially) factored out and moved to Lucene as extensions 
> that would be used by both projects, but there are also 
> significant differences due to the way indexes are used by 
> Solr (as a core functionality) and Jackrabbit (as one module 
> that's more tightly integrated with the storage features). So 
> that's probably not as trivial as it might seem.

Exactly, and more about it: first of all, you should not compare
(Lucene/Solr) and Jackrabbit indexing. If you want to compare, you
should compare Solr with Jackrabbit, because they both use the Lucene
search engine. And, there are important differences between the indexing
*requirements* for Solr and Jackrabbit. Solr can update many documents
in the background, warming up an updated index in the background, and
then, every X seconds (minutes) replace the currently used indexSearcher
with the newly pre-warmed indexSearcher. This is clearly not really
live. 

Jackrabbit OTOH, needs to be up2date with its index(es) always! After a
session.save(), all changes has to be accounted for in the index(es),
and every search result needs to reflect these changes instantly (to be
precise, the indexing is queued, but needs to be finished when a search
request is done, therefor the request gets blocked untill indexing is
done). Also, you need to realize that searches involving hierarchical
queries (ie, starting with some path) are resolved within the index!
IMHO, Jackrabbit requirements are way harder and much more complex then
the ones from Solr. Also, IMHO, I think Jackrabbit indexing structure
has a better choosen technique for fast incremental updating then Solr.
I talked to some Lucene and Solr committers last ApacheCon and they were
quite interested in this Jackrabbit architecture (in short: Jackrabbit
is not using one single lucene index, but has many indexes, behaving
similar to the segments of lucene within one index, see [1]). 

So, recapitulating, no, there is totally no overlap between Solr and
Jackrabbit indexing. Currently, there is a little overlap between
Jackrabbit and Lucene's latest version, because Lucene added some
functionality that was already partly added in Jackrabbit indexing, and
now has some imcompatibility.

-Ard

[1] http://jackrabbit.apache.org/index-readers.html

> 
> -Bertrand
> 

Mime
View raw message