lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Alyea <dal...@gmail.com>
Subject Re: How to wildcard
Date Thu, 15 Nov 2012 21:48:47 GMT
OK, I tried that.  Had just Snowball and EdgeNGram
in both index and query.  When I ran the "sm3 carbon"
select, it went from 3,500 matches to 89,000!  So yes,
that edge building works!  But too much.  And... the
top score matches didn't look at all like "sm3 carbon"
products, and the shoes were no where in sight.  So,
I'll toy with it on a dev instance and see what I see.
I definitely like the idea and I can see that N-gram
tokens are going to behave like wildcarding.

On Thu, Nov 15, 2012 at 4:13 PM, Robert Muir <rcmuir@gmail.com> wrote:

> On Thu, Nov 15, 2012 at 9:44 AM, David Alyea <dalyea@gmail.com> wrote:
> >
> > to index:
> > <filter class="solr.PorterStemFilterFactory"/>
> > <filter class="solr.KStemFilterFactory"/>
> > <filter class="solr.EnglishMinimalStemFilterFactory"/>
> >
> > to query:
> > <filter class="solr.SnowballPorterFilterFactory" language="English" />
> >
>
> I don't think its a good idea to use 4 different stemming algorithms
> (porter1, kstem, plural at index-time) and porter2 at query-time.
> This means you are analyzing terms in a totally different way at index
> time than you are at query-time.
>
> Just pick one of them: make your index-time and query-time analysis
> the same as a start and I think you will see less surprises.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message