lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: How to wildcard
Date Thu, 15 Nov 2012 21:59:18 GMT
Remember to distinguish between recall and precision - you're likely to
get too many results, but what matters is whether the first ones are
useful.

You could have two versions of your field, one with normal stemming,
another with n-grams, and boost the normal field above the n-gram one,
give exact matches a boost above inexact matches.

Upayavira

On Thu, Nov 15, 2012, at 09:48 PM, David Alyea wrote:
> OK, I tried that.  Had just Snowball and EdgeNGram
> in both index and query.  When I ran the "sm3 carbon"
> select, it went from 3,500 matches to 89,000!  So yes,
> that edge building works!  But too much.  And... the
> top score matches didn't look at all like "sm3 carbon"
> products, and the shoes were no where in sight.  So,
> I'll toy with it on a dev instance and see what I see.
> I definitely like the idea and I can see that N-gram
> tokens are going to behave like wildcarding.
> 
> On Thu, Nov 15, 2012 at 4:13 PM, Robert Muir <rcmuir@gmail.com> wrote:
> 
> > On Thu, Nov 15, 2012 at 9:44 AM, David Alyea <dalyea@gmail.com> wrote:
> > >
> > > to index:
> > > <filter class="solr.PorterStemFilterFactory"/>
> > > <filter class="solr.KStemFilterFactory"/>
> > > <filter class="solr.EnglishMinimalStemFilterFactory"/>
> > >
> > > to query:
> > > <filter class="solr.SnowballPorterFilterFactory" language="English" />
> > >
> >
> > I don't think its a good idea to use 4 different stemming algorithms
> > (porter1, kstem, plural at index-time) and porter2 at query-time.
> > This means you are analyzing terms in a totally different way at index
> > time than you are at query-time.
> >
> > Just pick one of them: make your index-time and query-time analysis
> > the same as a start and I think you will see less surprises.
> >

Mime
View raw message