lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Build failed in Hudson: Lucene-trunk #1187
Date Fri, 14 May 2010 09:14:45 GMT
Wow another issue caught by random testing!

On Fri, May 14, 2010 at 1:42 AM, Robert Muir <rcmuir@gmail.com> wrote:
> the problem is a logic bug (e.g. i have no clue how to really fix
> except to switch over to a UTF-8 sort order).
>
> in converting automaton to utf-8/32, and trying to emulate the utf-16
> term dictionary order, the byte transition ranges (although sorted in
> utf-16 order) are themselves in utf-8/32 order: e.g. a byte range of
> 0xe0-0xef is problematic during enumeration since the 0xee-0xef
> component should be "sorted last" in utf-16 order.

Ugh.  I suppose we could forcefully split such edges?  (We'd have to
fix reduce to not consolidate them).

Or just cutover to UTF8 order for trunk.

> i know a workaround until we switch over, but its gonna cause wasted
> seeks at the least (its just wrong).

This is the FIXME you committed right?  Ie always seek...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message