incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <>
Subject Re: real time updates
Date Sat, 14 Mar 2009 19:21:10 GMT
On Fri, Mar 13, 2009 at 4:03 PM, Marvin Humphrey <> wrote:
>  Every once in a while, the
> segment merging algorithm decides that it needs to perform a big
> consolidation, and you have to wait while it finishes.

Yes, but that's an artifact of current approach of adding segments
rather than making real-time replacements.  I was more wondering if
there is anything inherent about the rate of change required that
would prevent a fully incremental update from working.

If it could be pulled off, I think the advantages are large:  no
degradation due to accumulated changes, and no periodic long merges.
There's also the benefit that any changes written are likely to be hot
in the cache, so no warm up is needed.

> How does this interface look?
>  package MyDelWriter;
>  use base qw( Lucy::Index::DeletionsWriter );
>  ...

This feels to me like it is solving the wrong problem.  There's
nothing wrong with it, but DeletionsWriter and DeletionsReader seem
like internal implementation details of  particular type of Index.
Should the callers even have to know about their existence?

I'd hope that the interface between a Scorer and an Index could be
very simple,  probably just a single function to get a PostingList.
Thta PostingList would provide navigation by docID, but  deletions
would be handled internally and never be seen by the Scorer.

For indexing, I'd love to see the same agnostic behaviour.  The
Indexer calls knows only about a single function like
UpdatePosting(docID, newPostings).  Whether this is done internally
via tombstones, real-time updates or carrier pigeon is hidden from the

So while the interface you propose is probably great for making small
modifications to the current Index, I'd rather it not be part of the
official API that all Index formats must support.  I want each
component to make as few assumptions as possible about the internals
of other components.

My canonical example for this is that I want to be able to store my
index in SQLite, and write a thin layer of interface between it and
the rest of Lucy.  But my real desire is to substitute a custom mmap()
solution such as the fast graph database referenced earlier.

 I think the easiest way to make this possible is to reduce the points
of intersection between the components to the simplest set possible.
Instead of specifying a full internal API for each component, specify
(and restrict) only the the portions visible to the rest of the

Nathan Kurz

View raw message