lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: More like this returning similarities that are too generic
Date Mon, 07 Aug 2006 20:29:42 GMT
Well, I expect that defining "less common" is tricky and doesn't lend itself
to a canned answer <G>. Would it work to create your own list of stop words
(possibly very large) to use for indexing and/or searching? This would
simply exclude the "less common" words (as you define them).
StandardAnalyzer, for instance, can take a File of stop words in one of its
constructors.......

Erick

On 8/7/06, Chad Hardin <chardin@topiatechnology.com> wrote:
>
> hi all,
>
> I'm new to lucene but I'm loving it!  I'm writing a prototype that
> links documents together based upon similarities.  Obviously the
> first thing I did was use MoreLikeThis.  However, it seems to be
> finding matches based upon words that are too common, in this case
> the words "from" and "can" and seems to be missing matches using the
> terms I would expect (in this case documents about "bikes").
>
> I seems I need a more custom tailored Filter that only passes through
> more less-common words.  Does something like this already exist?
>
> Thanks,
>
> Chad
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message