lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "viobade (JIRA)" <>
Subject [jira] Commented: (LUCENE-1491) EdgeNGramTokenFilter stops on tokens smaller then minimum gram size.
Date Wed, 03 Jun 2009 07:30:07 GMT


viobade commented on LUCENE-1491:

I think is better to keep the main goal of ngram: groups of characters between min and max.
 If is need in any practical situation for minimum ngram equals with one or two characters,
this can be done setting the minimum....otherwise the filter must  work in the way that is
expected.. If I expect subword with minimum 3 length why do I get a token with two characters
while it is not accomplish the condition?

> EdgeNGramTokenFilter stops on tokens smaller then minimum gram size.
> --------------------------------------------------------------------
>                 Key: LUCENE-1491
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.4, 2.4.1, 2.9, 3.0
>            Reporter: Todd Feak
>            Assignee: Otis Gospodnetic
>             Fix For: 2.9
>         Attachments: LUCENE-1491.patch
> If a token is encountered in the stream that is shorter in length than the min gram size,
the filter will stop processing the token stream.
> Working up a unit test now, but may be a few days before I can provide it. Wanted to
get it in the system.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message