lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: More like this returning similarities that are too generic
Date Mon, 07 Aug 2006 20:29:42 GMT
Well, I expect that defining "less common" is tricky and doesn't lend itself
to a canned answer <G>. Would it work to create your own list of stop words
(possibly very large) to use for indexing and/or searching? This would
simply exclude the "less common" words (as you define them).
StandardAnalyzer, for instance, can take a File of stop words in one of its


On 8/7/06, Chad Hardin <> wrote:
> hi all,
> I'm new to lucene but I'm loving it!  I'm writing a prototype that
> links documents together based upon similarities.  Obviously the
> first thing I did was use MoreLikeThis.  However, it seems to be
> finding matches based upon words that are too common, in this case
> the words "from" and "can" and seems to be missing matches using the
> terms I would expect (in this case documents about "bikes").
> I seems I need a more custom tailored Filter that only passes through
> more less-common words.  Does something like this already exist?
> Thanks,
> Chad
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message