lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: are long words split into up to 256 long tokens?
Date Wed, 21 Apr 2010 13:50:15 GMT
> Is 256 some inner maximum too
> in some
> lucene internal that causes this? What is happening is that
> the long
> word is split into smaller words up to 256 and then the min
> and max
> limit applied. Is that correct? I have removed LengthFilter
> and still
> see the splitting at 256 happen. I would like not to have
> this, and
> removed altogheter any word longer than max, wihtout
> decomposing into
> smaller ones. Is there a way to achieve this?
> 
> Using lucene 3.0.1


Assuming your Tokenizer extends CharTokenizer:

CharTokenizer.java has this field: 
private static final int MAX_WORD_LEN = 255;

you can modify CharTokenizer.java according to your needs.


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message