lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
Date Wed, 26 Jan 2011 15:00:47 GMT
Hi Wulf,

You should consider decompounding! There are filters based on dictionaries
that support decompounding german words. ItÂ’s a TokenFilter to be put into
your analysis chain.
There is a simple Lucene-Rule: Whenever you need wildcards think about your
analysis, you probably did something wrong :-) Add stemming, decompounding,
synonyms,...

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Wulf Berschin [mailto:berschin@dosco.de]
> Sent: Wednesday, January 26, 2011 3:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
> 
> Hi Erick,
> 
> good points, but:
> 
> our index is fed with german text. In german (in contrast to english)
nouns
> are just appended to create new words. E.g.
> 
> Kaffee
> Kaffeemaschine
> Kaffeemaschinensatzbehälter
> 
> In our scenario standard fulltext search on "Maschine" shall present all
of
> these nouns. That's why we add * before and after on each term.
> 
> Of course we provide an option "full words only" which finds none of
these.
> 
> Since we do not wrap * around words shorter than 4 characters we weren't
> yet faced with the too many clauses exception.
> 
> Greetings
> Wulf
> 
> Am 26.01.2011 15:18, schrieb Erick Erickson:
> > It is, I think, a legitimate question to ask whether scoring is
> > worthwhile on wildcards. That is, does it really improve the user
> > experience? Because the MaxBooleanClause gets tripped pretty quickly
> > if you add the terms back in, so you'd have to deal with that.
> >
> > Would your users be satisfied with sorting in the case of only
> > searching on wildcard characters by, say, dates or titles or subjects
> > or.....?
> >
> > Then just let the scoring work on fields that aren't wildcarded...
> >
> > This may not be reasonable in your particular case, but it's worth
> > considering before assuming that scoring is necessary on wildcard
> > terms.
> >
> > Best
> > Erick
> >
> > On Wed, Jan 26, 2011 at 9:10 AM, Wulf Berschin<berschin@dosco.de>
> wrote:
> >
> >> Now I have the highlighted wildcards but obviously the scoring is
> >> lost. I see that a rewrite of the wildcard query produces a constant
> >> score query. I added
> >>
> >>
> setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY
> _REWRIT
> >> E);
> >>
> >> to my QueryParser instance but no effect. What's to be done now?
> >>
> >> Wulf
> >>
> >>
> >>
> >> Am 26.01.2011 11:06, schrieb Wulf Berschin:
> >>
> >>> Thank you Alexander and Uwe, for your help.
> >>>
> >>> I read Marks explanation but it seems to me that his changes are not
> >>> contained in Lucene-3.0.3.
> >>>
> >>> So I commented out the rewrite, changed QueryTermScorer back to
> >>> QueryScorer and now I got the wildcard queries highlighted again.
> >>>
> >>> Wulf
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message