lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Best way to create own version of StandardTokenizer ?
Date Fri, 04 Sep 2009 16:02:43 GMT
On Fri, Sep 4, 2009 at 11:18 AM, Paul Taylor<paul_t100@fastmail.fm> wrote:
> I submitted this https://issues.apache.org/jira/browse/LUCENE-1787 patch to
> StandardTokenizerImpl, understandably it hasn't been incoroprated into
> Lucene (yet) but I need it for the project Im working on. So would you
> recommend keeping the same class name, and just putting in the classpath
> before the lucene.jar, or creating a new Tokenizer,Impl and Jflex file in my
> own projects package space.

i would recommend creating one in your own package space.

> Also, the StandardTokenizerImpl.jflex file states it should be compiled with
> Java 1.4 not a later JDK, is this just for backwards compatability ? Because
> the indexes will be built afresh with this project  would I actually get a
> better results if I used a later JVM, the project has to deal with indexing
> text  which can be in any language and I'm hoping using the latest JVM may
> solve some mapping problems with Japanese, Hebrew and Korean that I don't
> really understand.

i do not think you will really get better results, but it depends what
your issue is (can you elaborate?)
upgrading from 1.4 -> 1.6 will bump your unicode version from 3 to 4.
you can see a list of the changes here:
http://www.unicode.org/versions/Unicode4.0.0/


-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message