lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wulf Berschin <bersc...@dosco.de>
Subject Re: Highlight Wildcard Queries: Scores
Date Wed, 26 Jan 2011 16:07:29 GMT
Hallo Uwe,

yes, thanks for the hint, that sounds good, but it seems to me I would 
then need more fields for all our search modes:

Now we have the fields "contents" without stoppwords and with stemming 
and "contents-unstemmed" whithout stemming.

The search options are:
- whole word (search "contents", no asterisks are being added before search)
- exact match (search "contents-unstemmed", implies whole word)

When decomposition comes into play I will need a third field 
"contents-undecomposed" (sorry) to perform the whole word search. 
Furthermore the contents-unstemmed should not be decomposed as well.

Would you still prefer this approach?

Viele Grüße aus Heidelberg
Wulf






Am 26.01.2011 16:00, schrieb Uwe Schindler:
> Hi Wulf,
>
> You should consider decompounding! There are filters based on dictionaries
> that support decompounding german words. It’s a TokenFilter to be put into
> your analysis chain.
> There is a simple Lucene-Rule: Whenever you need wildcards think about your
> analysis, you probably did something wrong :-) Add stemming, decompounding,
> synonyms,...
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Wulf Berschin [mailto:berschin@dosco.de]
>> Sent: Wednesday, January 26, 2011 3:56 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: ****SPAM(5.0)**** Re: Highlight Wildcard Queries: Scores
>>
>> Hi Erick,
>>
>> good points, but:
>>
>> our index is fed with german text. In german (in contrast to english)
> nouns
>> are just appended to create new words. E.g.
>>
>> Kaffee
>> Kaffeemaschine
>> Kaffeemaschinensatzbehälter
>>
>> In our scenario standard fulltext search on "Maschine" shall present all
> of
>> these nouns. That's why we add * before and after on each term.
>>
>> Of course we provide an option "full words only" which finds none of
> these.
>>
>> Since we do not wrap * around words shorter than 4 characters we weren't
>> yet faced with the too many clauses exception.
>>
>> Greetings
>> Wulf
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message