lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject mixing dv types in one IW session
Date Mon, 28 May 2012 19:53:34 GMT

Just doing some playing around, i wanted to see what happens if you
changeup a docvalues type across different documents in a single IW
session, e.g.

case 1:
doc1.add(new IntDocValuesField("foo", 5))
doc2.add(new FloatDocValuesField("foo", 2.5f))

in this case the 2.5f is truncated to an int and becomes a 2

case 2:
doc3.add(new StraightBytesDocValuesField("foo", new BytesRef("boo!"))

in this case you hit an NPE in IntsWriter, because the straightbytes
impl naturally cannot return an intvalue.

So I'm wondering what we should do?
Currently both merging and multidocvalues do a type-promotion, but if
it happens in the same iw session this won't happen.

idea 1: throw an exception if the type is changed in one session. this
leaves things a little inconsistent, but prevents strange results.
idea 2: throw an exception if the type is changed *and also on
merge/multidocvalues*. This seems a little cruel (no way to upgrade
your short to int if you need later) but would simplify some code.
(evil) idea 3: force a flush if the type is changed and let merging
take care of it.
idea 4: buffer docvalues in ram in IW instead of inside the codec, in
a "type-independent way" (e.g. sorted hash of the unique byte values +
per-doc ords). this is a lot of work, but would make the codec side of
DV simpler as it just does encode/decode and wouldnt have to do ram
accounting or deal with types changing or any of that.

any other ideas?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message