incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goran kent <>
Subject [lucy-user] Merging indexes efficiently
Date Wed, 14 Sep 2011 06:37:59 GMT

Accessing is spotty at the
moment, so I'm reading the man pages...

I take it $index->add_index($other_index) is the method to merge
multiple indexes?

I'm thinking of the most efficient way to merge a batch of thousands of indexes:


# TRY1 - hell for leather
$master_index = Lucy::Index::Indexer->new...
foreach $sub_index (...) {

Now, I imagine this is no problem for a handful of sub_indexes, but
what are the risks when this involves thousands?  Are there any kind
of limitations or pitfalls I should be aware of when doing this?

# TRY2 - tippy-toe
foreach $sub_index (...) {
    if ($cnt++ > $MAX) { $cnt=0; $master_index->commit();
$master_index = Lucy::Index::Indexer->new($master_index,
create=>0,truncate=>0...; }
$master_index->commit unless $already_committed;


# TRY3 - depending on whether I grok prepare_commit()
foreach $sub_index (...) {
    if ($cnt++ > $MAX) { $cnt=0; $master_index->prepare_commit(); }

The question is also what's the most efficient $MAX (I imagine it
depends on RAM if stuff is kept therein before a commit)...  or should
I not overcomplicate things and simply allow Lucy to worry about the
internals and gun for TRY1?  TRY2 allows me an opportunity to check
on-disk $master_index size after a commit (are the buffers flushed
after a commit and things committed to disk so a qx(du -sh $master)
reflects actual size?),...  I lean towards TRY2, or will TRY3 also
commit to disk?...



View raw message