lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: are long words split into up to 256 long tokens?
Date Wed, 21 Apr 2010 14:18:36 GMT
Can you open a bug report to make this configureable, so we don't forget this? E.g. StandardTokenizer
is able to change this.

Thanks,
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: jm [mailto:jmuguruza@gmail.com]
> Sent: Wednesday, April 21, 2010 3:59 PM
> To: java-user@lucene.apache.org
> Subject: Re: are long words split into up to 256 long tokens?
> 
> oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched
> lucene source code for 256 and found nothing suspicious, and that was
> itself suspicious cause it looked clearly like an inner limit. Of
> course I should have searched for 255...
> 
> I'll see how I proceed cause I don't want to use a custom build.
> 
> On Wed, Apr 21, 2010 at 3:50 PM, Ahmet Arslan <iorixxx@yahoo.com>
> wrote:
> >> Is 256 some inner maximum too
> >> in some
> >> lucene internal that causes this? What is happening is that
> >> the long
> >> word is split into smaller words up to 256 and then the min
> >> and max
> >> limit applied. Is that correct? I have removed LengthFilter
> >> and still
> >> see the splitting at 256 happen. I would like not to have
> >> this, and
> >> removed altogheter any word longer than max, wihtout
> >> decomposing into
> >> smaller ones. Is there a way to achieve this?
> >>
> >> Using lucene 3.0.1
> >
> >
> > Assuming your Tokenizer extends CharTokenizer:
> >
> > CharTokenizer.java has this field:
> > private static final int MAX_WORD_LEN = 255;
> >
> > you can modify CharTokenizer.java according to your needs.
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message