lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Why release 3.0?
Date Tue, 17 Nov 2009 02:44:10 GMT
well, in all honesty there is a bit of complexity.
i leave the StandardTokenizer out of this, it gives the same results
regardless of JVM version.
it may not be correct, but its consistent, we could wait till 5.0 or 10.0 to
make it correct :)
Also, because it gives the same results regardless of JVM version, we can
actually use the Version logic to improve it, as Uwe showed.

The rest of it is where it gets nasty,
Fixing the Simple/StopAnalyzer is actually the worst, because we have to
deprecate the isTokenChar(char) and normalize(char) callbacks in favor of
int-based versions.
We also have to fix this i/o buffering logic present in for example,
CharTokenizer, which just does things like refill a buffer of size 4096
without checking to ensure it doesn't break a surrogate pair.

and then we have contrib...!

so you see why i ask about 'index backwards compatibility', because I don't
consider it actually working between 2.9->3.0 anyway, and adding that on top
of fixing this stuff, and ensuring API backwards compat,
that's especially nasty.

> Always depends though. This double index thing you mention is nasty (3.0
> and 3.1 for the unfortunate). I'd swallow a few careful deprecations in
> 3.0 to avoid that with my vote.
> --
> - Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Robert Muir

View raw message