lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jm <jmugur...@gmail.com>
Subject Re: are long words split into up to 256 long tokens?
Date Wed, 21 Apr 2010 14:31:03 GMT
ok https://issues.apache.org/jira/browse/LUCENE-2407

On Wed, Apr 21, 2010 at 4:18 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Can you open a bug report to make this configureable, so we don't forget this? E.g. StandardTokenizer
is able to change this.
>
> Thanks,
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: jm [mailto:jmuguruza@gmail.com]
>> Sent: Wednesday, April 21, 2010 3:59 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: are long words split into up to 256 long tokens?
>>
>> oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched
>> lucene source code for 256 and found nothing suspicious, and that was
>> itself suspicious cause it looked clearly like an inner limit. Of
>> course I should have searched for 255...
>>
>> I'll see how I proceed cause I don't want to use a custom build.
>>
>> On Wed, Apr 21, 2010 at 3:50 PM, Ahmet Arslan <iorixxx@yahoo.com>
>> wrote:
>> >> Is 256 some inner maximum too
>> >> in some
>> >> lucene internal that causes this? What is happening is that
>> >> the long
>> >> word is split into smaller words up to 256 and then the min
>> >> and max
>> >> limit applied. Is that correct? I have removed LengthFilter
>> >> and still
>> >> see the splitting at 256 happen. I would like not to have
>> >> this, and
>> >> removed altogheter any word longer than max, wihtout
>> >> decomposing into
>> >> smaller ones. Is there a way to achieve this?
>> >>
>> >> Using lucene 3.0.1
>> >
>> >
>> > Assuming your Tokenizer extends CharTokenizer:
>> >
>> > CharTokenizer.java has this field:
>> > private static final int MAX_WORD_LEN = 255;
>> >
>> > you can modify CharTokenizer.java according to your needs.
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message