lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Confused with NGRAM results
Date Fri, 29 Aug 2008 03:23:23 GMT
This actually sounds bugish to me, but you removed the text from your original email, so I
don't know what context this was in.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: gaz77 <gareth.cole@bit10.net>
> To: java-user@lucene.apache.org
> Sent: Friday, August 29, 2008 12:50:46 AM
> Subject: Re: Confused with NGRAM results
> 
> 
> Thanks for the pointer.
> 
> I've gone into this in some depth, using the AnalyzerUtils class from the
> lucene in action book.
> 
> It seems that the NGramTokenFilter is only processing part of the string
> that goes in. It stops tokenising the words part way through. That's why the
> documents weren't found in results.
> 
> I've had a look at the source code, and I think it's because the next()
> function returns null when it hits a token smaller than the min ngram size.
> For example, if I set the minimum to 3, then a 2-character token will cause
> it to return null.
> 
> I'm not sure if this is by design or a bug. either way, at least I know
> what's causing it now.
> 
> Cheers
> 
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Confused-with-NGRAM-results-tp19202310p19210665.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message