lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Realtime Search for Social Networks Collaboration
Date Fri, 19 Sep 2008 17:42:11 GMT

Jason Rutherglen wrote:

> How does column stride fields work for StringIndex field caching?

I'm not sure -- Michael Busch is working on column-stride fields.

> I
> have been working on the tag index which may be more suitable for
> field caching and makes range queries faster.  It is something that
> would be good to integrate into core Lucene as well.  It may be more
> suitable for many situations.  Perhaps the column stride and tag index
> can be merged?  What is the progress on cs?

Michael can you answer on the progress of column-stride fields / how  
Jason's Tag index would apply?

>> Reopen then must only "materialize" any
>> buffered deletes by Term & Query, unless we decide to move up that
>> materialization into the actual delete cal, since we will have
>> SegmentReaders open anyway.  I think I'm leaning towards that  
>> approach...
>> best to pay the cost as you go, instead of aggregated cost on reopen?
> I don't follow this part.  There is an IndexReader exposed from
> IndexWriter.  I think the individual SegmentReaders should be exposed
> as well, I don't see any reason not to and there are many cases where
> it has been frustrating that SegmentReaders are package protected.

Well, you ask IndexWriter for a reader.  It returns to you an  
IndexReader impl that under the hood is basically  MultiReader over a  
bunch of SegmentReaders (already flushed to the index), plus the  
RAMReader.  We may want to expose access these sub-readers, but that's  
orthogonal I think?

> I am not sure from what you mentioned how the deletedDocs bitvector is
> handled.

I'm now thinking each SegmentReader holds its own deletedDocs as well  
as a pending deletedDocs (deletes that happened since the last  
reopen).  As deletes are done (by Query, Term or doc ID) in  
IndexWriter, they are synchronously materialized & recorded against  
the pending deletedDocs for each SegmentReader as well as the RAM  
deletedDocs (that apply to docs buffered in RAM).  When you reopen,  
the pending deletions are merged with the deletedDocs.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message