lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Per-document Payloads
Date Mon, 22 Oct 2007 15:53:12 GMT
Michael Busch wrote:
> If you store unique docIds, then there are no two documents that share
> the same value. That means, that each value gets its own entry in the
> dictionary and to load each value it is necessary to perform two random
> I/O seeks (one for term lookup + one to open the posting list).

Except they shouldn't be random seeks, but rather sequential accesses, 
since the term list is accessed in order, and the postings are processed 
in order, no?  It would be interesting to profile this.  We should make 
sure that we've well-optimized the case where we seek for the next term, 
seek to the current position in a file, etc.  Profiling should show if 
we've missed obvious optimizations for this case.

> I was therefore thinking about adding per-document payloads to Lucene

If this is really required, perhaps it ought to appear as an attribute 
for stored fields, indicating that the field should be stored in a 
separate "column store".  This would permit efficient enumeration of 
values of just that field.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message