lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Best way to create own version of StandardTokenizer ?
Date Fri, 04 Sep 2009 16:54:35 GMT
Robert Muir wrote:
> On Fri, Sep 4, 2009 at 11:18 AM, Paul Taylor<paul_t100@fastmail.fm> wrote:
>   
>> I submitted this https://issues.apache.org/jira/browse/LUCENE-1787 patch to
>> StandardTokenizerImpl, understandably it hasn't been incoroprated into
>> Lucene (yet) but I need it for the project Im working on. So would you
>> recommend keeping the same class name, and just putting in the classpath
>> before the lucene.jar, or creating a new Tokenizer,Impl and Jflex file in my
>> own projects package space.
>>     
>
> i would recommend creating one in your own package space.
>
>   
>> Also, the StandardTokenizerImpl.jflex file states it should be compiled with
>> Java 1.4 not a later JDK, is this just for backwards compatability ? Because
>> the indexes will be built afresh with this project  would I actually get a
>> better results if I used a later JVM, the project has to deal with indexing
>> text  which can be in any language and I'm hoping using the latest JVM may
>> solve some mapping problems with Japanese, Hebrew and Korean that I don't
>> really understand.
>>     
>
> i do not think you will really get better results, but it depends what
> your issue is (can you elaborate?)
> upgrading from 1.4 -> 1.6 will bump your unicode version from 3 to 4.
> you can see a list of the changes here:
> http://www.unicode.org/versions/Unicode4.0.0/
>
>
>   
Things like:
 
http://bugs.musicbrainz.org/ticket/1006
http://bugs.musicbrainz.org/ticket/5311
http://bugs.musicbrainz.org/ticket/4827

Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message