lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: Best way to create own version of StandardTokenizer ?
Date Fri, 04 Sep 2009 16:54:35 GMT
Robert Muir wrote:
> On Fri, Sep 4, 2009 at 11:18 AM, Paul Taylor<> wrote:
>> I submitted this patch to
>> StandardTokenizerImpl, understandably it hasn't been incoroprated into
>> Lucene (yet) but I need it for the project Im working on. So would you
>> recommend keeping the same class name, and just putting in the classpath
>> before the lucene.jar, or creating a new Tokenizer,Impl and Jflex file in my
>> own projects package space.
> i would recommend creating one in your own package space.
>> Also, the StandardTokenizerImpl.jflex file states it should be compiled with
>> Java 1.4 not a later JDK, is this just for backwards compatability ? Because
>> the indexes will be built afresh with this project  would I actually get a
>> better results if I used a later JVM, the project has to deal with indexing
>> text  which can be in any language and I'm hoping using the latest JVM may
>> solve some mapping problems with Japanese, Hebrew and Korean that I don't
>> really understand.
> i do not think you will really get better results, but it depends what
> your issue is (can you elaborate?)
> upgrading from 1.4 -> 1.6 will bump your unicode version from 3 to 4.
> you can see a list of the changes here:
Things like:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message