jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........
Date Mon, 29 Nov 2010 13:43:29 GMT
On Mon, Nov 29, 2010 at 1:49 PM, Ian Boston <ieb@tfd.co.uk> wrote:
> On 29 Nov 2010, at 12:06, Alexander Klimetschek wrote:
>> (as a randomly-accessible
>> binary)
> One of the reasons the JDBCDirectory is not fast is that most DBs dont support seek on
blobs, and anyway, anything that is shared over a network is just too slow, unless a local
cached version of the index is made available. I think thats why the Infinispan Directory
does work. BTW, iirc you can configure infinispan to page its cache to disk.

Indeed. Lucene needs so many random seeks, that the only (in my view)
efficient way is to have it on local disk. Lucene 4.0 even removes
many internal caches (like FieldCache!!!) and relies completely on
file system caches. This will actually make things like sorting on
tens of millions of titles possible without going OOM.

I didn't look at infinispan yet code wise, but of course they have a
way to flush the memory to disk, or, to database. We might add
flushing to jcr, which would make the lucene segments be flushed into
the repository (as Alexander earlier pointed out)

> I tired a number of impls of remote shared Lucene indexes when I was writing the search
engine for Sakai 2, all failed. The only solution that worked was one where lucene was allowed
to perform seeks on local disk or in memory. (documents were indexed on one node in the cluster
(round robin), the indexing nodes ship segments updates, and all nodes search on local indexes
but not real time as Jackrabbit is)

Yes. I recently had some talks with Simon Willnauer, one of the very
few Lucene committers that know how the low-level persistence and read
works: Lucene cannot perform other then FS or in memory. The other day
I attended a talk about Lucandra at Atlante Apachecon: Lucene in a
distributed Cassandra ring...they hit performance penalties after
100.000 lucene docs...well, it is just not possible (or I am too
stupid) :-)

Cheers Ard

> Ian

Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

View raw message