lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: EdgeNgramTokenFilter and positions
Date Wed, 05 Sep 2012 20:47:14 GMT
I don't see a Jira for it, but I do see the bad behavior in both Solr 3.6 
and 4.0-BETA in Solr admin analysis.

Interestingly, the screen shot for LUCENE-3642 does in fact show the 
(improperly) incremented positions for successive ngrams.

See:
https://issues.apache.org/jira/browse/LUCENE-3642

I'm surprised that nobody noticed the bogus positions back then.

Technically, this is a Lucene issue.

-- Jack Krupansky

-----Original Message----- 
From: Walter Underwood
Sent: Wednesday, September 05, 2012 1:51 PM
To: solr-user@lucene.apache.org
Subject: EdgeNgramTokenFilter and positions

In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at 
sequential positions. This seems wrong, because an n-gram is associated with 
a source token at a specific position. It also really messes up phrase 
matches.

With the source text "fleen", these positions and tokens are generated:

1,fl
2,fle
3,flee
4,fleen

Is this a known bug? Fixed? I'm running 3.3.

wunder
--
Walter Underwood
Search Guy
wunder@chegg.com<mailto:wunder@chegg.com>




Mime
View raw message