lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <>
Subject Re: Per-document Payloads
Date Tue, 30 Oct 2007 14:29:41 GMT
Le lundi 29 octobre 2007, Michael McCandless a écrit :
> "Michael Busch" <> wrote:
> > Michael McCandless wrote:
> > > Michael, are you thinking that the storage would/could be non-sparse
> > > (like norms), and loaded/cached once in memory, especially for fixed
> > > size fields?  EG a big array of ints of length maxDocID?  In John's
> > > original case, every doc has this UID int field; I think this is
> > > fairly common.
> >
> > Yes I agree, this is a common use case. In my first mail in this thread
> > I suggested to have a flexible format. Non-sparse, like norms, in case
> > every document has one value and all values have the same fixed size.
> > Sparse and with a skip list if one or both conditions are false.
> >
> > The DocumentsWriter would have to check whether both conditions are
> > true, in  which case it would store the values non-sparse. The
> > SegmentMerger would only write the non-sparse format for the new segment
> > if all of the source segments also had the non-sparse format with the
> > same value size.
> >
> > This would provide the most flexibility for the users I think.
> OK, got it.  So in the case where I always put a field "UID" on every
> document, always a 4-byte binary field, then Lucene will "magically"
> store this as non-sparse column-stride field for every segment.
> But I still have to mark the Field as "column-stride storage" right?

It depends how the API should look like. Either Lucene support every different 
format support, so you have explicitely bind fileds with a format, either you 
open up the API so it is the Lucene user who choose how to store its data.

As said earlier in the thread, some work have done done against the second 
choice : LUCENE-662

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message