lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: Confused with NGRAM results
Date Thu, 28 Aug 2008 15:37:16 GMT
Hi gaz77,

Here's a good place to start:

<http://wiki.apache.org/jakarta-lucene/AnalysisParalysis>

Steve

On 08/28/2008 at 10:52 AM, gaz77 wrote:
> 
> Hi,
> 
> I'd appreciate if someone could explain the results I'm getting.
> 
> I've written a simple custom analyzer that applies the
> NGramTokenFilter to the token stream during indexing. It's never
> applied during searching. The purpose of this is to match sub-words.
> 
> Without the ngram filter, if I searched on the word 'postcode' it
> returns 2 documents. If I searched on 'code' it returns 6 documents
> (with no overlap on the postcode results).
> 
> If I apply the ngram filter with min 1 and max 10, searching 'postcode'
> returns the same 2 docs, while searching 'code' returns 9 docs. This
> sort-of feels right.
> 
> The problem comes when I set the min ngram size to 3, and the
> max to 5. Searching 'postcode' returns no results (as expected), but
> searching 'code' only returns 2 docs (the 2 normally returned by a
> 'postcode' search).
> 
> This last result for 'code' just doesn't seem correct - it should be
> returning at least the 6 docs from the original search.
> 
> I'd really appreciate some advice on what is going on with
> the ngram filter.
> 
> Thanks
> -- View this message in context: http://www.nabble.com/Confused-with-NGRAM-results-tp19202310p19202310.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message