lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java
Date Sun, 22 Nov 2009 21:06:37 GMT
I guess here is where I just say that unicode and java are optimized for
utf-16 processing, and so while I agree with byte[] being available in
places like this for flex indexing,
I'm already nervous about seeing code / optimizations that only work well
with latin-1, and are very slow / buggy for anything else.

On Sun, Nov 22, 2009 at 3:58 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Sun, Nov 22, 2009 at 3:52 PM, Robert Muir <rcmuir@gmail.com> wrote:
> >
> > On Sun, Nov 22, 2009 at 3:50 PM, Michael McCandless
> > <lucene@mikemccandless.com> wrote:
> >>
> >> Yeah I think there will be lots of optimizing we can do, after flex
> lands.
> >>
> >> Maybe stick w/ String for now?  But open an issue, today, to remind us
> >> to cutover to char[] post-flex?
> >
> > ok, i'll create one.
>
> Thanks.
>
> >> Doing all processing in UTF8 is tantalizing too ;)  This would mean no
> >> conversion of the terms data on iterating from the terms dict...
> >
> > lets please not go this route :) its gonna be enough trouble fixing the
> > char[]-based code for unicode 4, forget about byte[]
>
> I'll defer to you ;)
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message