lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Grotzke <martin.grot...@javakaffee.de>
Subject Re: Different search results for (german) singular/plural searches - looking for a solution
Date Tue, 16 Oct 2007 16:52:53 GMT
Hi,

now I played around with the snowball porter stemmer and it definitely
feels really good (used German2 as suggested).

For some cases (e.g. product types like top/tops, bermuda/bermudas or
hoody/hoodies) additionally we need synonyms. At first I thought it
would be good to use synonyms only at query time, but the docs in the
wiki recommend to expand synonyms at index time...

What are your experiences? Would you also suggest to use them when
indexing?

On Thu, 2007-10-11 at 17:32 +0200, Thomas Traeger wrote:
> Martin Grotzke schrieb:
> >> Try the SnowballPorterFilterFactory with German2 as language attribute 
> >> first and use synonyms for combined words i.e. "Herrenhose" => "Herren",

> >> "Hose".
> >>     
> > so you use a combined approach?
> >   
> Yes, we define the relevant parts of compounded words (keywords only) as 
> synonyms and feed them in a special field that is used for searching and 
> for the product index. 
So you don't use a single catchall field "text"? What is the reason for
this, what is the advantage?

> I hope there will be a filter that can split 
> compounded word sometimes in the future...
There is no standard approach for handling this problem apart from
synonyms?
This is exactly what jwordsplitter does (as posted by Daniel)...


Thanx && cheers,
Martin


> >> By using stemming you will maybe have some "interesting" results, but it 
> >> is much better living with them than having no or much less results ;o)
> >>     
> > Do you have an example what "interesting" results I can expect, just to
> > get an idea?
> >   
> >> Find more infos on the Snowball stemming algorithms here:
> >>
> >> http://snowball.tartarus.org/
> >>     
> > Thanx! I also had a look at this site already, but what is missing is a
> > demo where one can see what's happening. I think I'll play a little with
> > stemming to get a feeling for this.
> >   
> I think the Snowball stemmer is very good so I have no practical example 
> for you. Maybe this is of value to see what happens:
> 
> http://snowball.tartarus.org/algorithms/german/diffs.txt
> 
> If you have mixed languages in your content, which sometimes happens in 
> product data, you might get into some trouble.
> 
> >> Also have a look at the StopFilterFactory, here is a sample stopwordlist 
> >> for the german language:
> >>
> >> http://snowball.tartarus.org/algorithms/german/stop.txt
> >>     
> > Our application handles products, do you think such stopwords are useful
> > in this scenario also? I wouldn't expect a user to search for "keine
> > hose" or s.th. like this :)
> >   
> I have seen much worse queries, so you never know ;o)
> 
> think of a query like this: "Hose in blau für Herren"
> 
> You will definetly want to remove "in" and "für" during searching and it 
> reduces index size when removed during indexing. Maybe you will even get 
> better scores when only relevant terms are used. You should optimze the 
> stopword list based on your data.
> 
> Regards,
> 
> Tom
> 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/

Mime
View raw message