lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Future projects
Date Thu, 02 Apr 2009 19:59:46 GMT
On Thu, Apr 2, 2009 at 2:07 PM, Jason Rutherglen
<> wrote:
> I'm interested in merging cached bitsets and field caches.  While this may
> be something related to LUCENE-831, in Bobo there are custom field caches
> which we want to merge in RAM (rather than reload from the reader using
> termenum + termdocs).  This could somehow lead to delete by doc id.

What does Bobo use the cached bitsets for?

Merging FieldCache in RAM is also interesting for near-realtime
search, once we have column stride fields.  Ie, they should behave
like deleted docs: there's no reason to go through disk when merging
them -- just carry them straight to the merged reader.  Only on commit
do they need to go to disk.  Hmm in fact we could do this today, too,
eg with norms as a future optimization if needed.  And that
optimization applies to flushing as well (ie, when flushing a new
segment, since we know we will open a reader, we could NOT flush the
norms, and instead put them into the reader, and only on eventual
commit, flush to disk).

> Tracking the genealogy of segments is something we can provide as a callback
> from IndexWriter?  Or could we add a method to IndexCommit or SegmentReader
> that returns the segments it originated from?

Well.... the problem with my idea (callback from IW when docs shift)
is internally IW always uses the latest reader to get any new docIDs.

Ie we only have to renumber from gen X to X+1, then from X+1 to X+2
(where each "generation" is a renumbering event).

But if you have a reader, perhaps oldish by now, we'd need to give you
a way to map across N generations of docID shifts (which'd require the
genealogy tracking).

Alas I think it will quickly get hairy.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message