lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: mixing dv types in one IW session
Date Tue, 29 May 2012 16:20:08 GMT
I agree this inconsistency is bad... and silently losing stuff (float
2.5 becomes int 2) is really bad.  We should do something before 4.0.

I would prefer idea 2, i.e. that we never allow changing/promoting a DV
type for a given field, and that we do our best to throw clear exc if you
do so. I realize this is different from other things in Lucene where "anything
goes" but DV is new in 4.0 so we are free to set new rules.

Also, if this somehow later proves to be a bad decision, we can always
add back in this leniency ... but not vice-versa.

Mike McCandless

On Mon, May 28, 2012 at 3:53 PM, Robert Muir <> wrote:
> Hello,
> Just doing some playing around, i wanted to see what happens if you
> changeup a docvalues type across different documents in a single IW
> session, e.g.
> case 1:
> doc1.add(new IntDocValuesField("foo", 5))
> doc2.add(new FloatDocValuesField("foo", 2.5f))
> in this case the 2.5f is truncated to an int and becomes a 2
> case 2:
> doc3.add(new StraightBytesDocValuesField("foo", new BytesRef("boo!"))
> in this case you hit an NPE in IntsWriter, because the straightbytes
> impl naturally cannot return an intvalue.
> So I'm wondering what we should do?
> Currently both merging and multidocvalues do a type-promotion, but if
> it happens in the same iw session this won't happen.
> idea 1: throw an exception if the type is changed in one session. this
> leaves things a little inconsistent, but prevents strange results.
> idea 2: throw an exception if the type is changed *and also on
> merge/multidocvalues*. This seems a little cruel (no way to upgrade
> your short to int if you need later) but would simplify some code.
> (evil) idea 3: force a flush if the type is changed and let merging
> take care of it.
> idea 4: buffer docvalues in ram in IW instead of inside the codec, in
> a "type-independent way" (e.g. sorted hash of the unique byte values +
> per-doc ords). this is a lot of work, but would make the codec side of
> DV simpler as it just does encode/decode and wouldnt have to do ram
> accounting or deal with types changing or any of that.
> any other ideas?
> --
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message