lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Could positions/payloads in SegmentMerger be copied directly?
Date Tue, 23 Sep 2008 08:56:04 GMT

Paul Elschot wrote:

> I had another look at SegmentTermDocs.skipTo() and at
> SegmentTermPositions, and I think I'm beginning to get
> your point.
> Could it be doable per skipInterval docs?

Almost ... but not quite, except maybe for the first segment being

The problem is, the new skip data will not in general be "aligned" to
the old skip data, except for the first segment.

EG the skipInterval is 16; say for term "foo" the first segment has 18
docs and the 2nd segment has 22 docs.  We could bulk-copy that first
chunk of 16 docs from the first segment, but then we write another 2
docs and then 14 docs into the 2nd segment we need to write new skip
data, so we cannot bulk copy the 2nd segment since then we won't know
the byte offset at that 14 doc point.

I guess we could entertain allowing skip intervals to not be
"regular", such that at the boundaries of previously merged segments
it's allowed to be different, but that's getting more invasive.

We have recently made great strides having merging be a bulk byte-copy
operation when possible (eg stored fields & term vectors do this
now), so I agree it'd be fabulous to get the postings to do bulk byte
copy.  They are the slowest part of merging now.

The frq postings could "almost" be made appendable, if we stored the
last docID in a posting list in the term dictionary.  This way we
could append, but simply rewrite only the first document of each
segment after the first segment to be the delta of its docID and the
last docID in the segment before it.  But again we'd be in trouble
with the skip data.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message