lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@activemath.org>
Subject Re: Using org.apache.lucene.analysis.compound
Date Wed, 21 Oct 2009 09:27:25 GMT

I'm interested to this analyzer.. it had escaped me and solves an old  
problem!
Could you report about its usage:
- did you have to feed words in a dictionary?
- does anyone have user-measures already?
... and the last question for the research fun: is there any approach  
towards preferring Überwachunggesetz as a concept than, say,  
Fleischüberwachung? (again, that could be based on a dictionary  
probably).

thanks in advance

paul


Le 21-oct.-09 à 04:00, Robert Muir a écrit :

> hi, it will work because it will also decompound "Rindfleish" into  
> Rind and
> fleish, with posIncr=0
>
> so if you index Rindfleischüberwachungsgesetz, then query with  
> "Rindfleish",
> its matching because Rindfleish also gets decompounded into Rind and  
> fleish.
>
> On Tue, Oct 20, 2009 at 8:35 PM, Benjamin Douglas
> <bbdouglas@basistech.com>wrote:
>
>> Hello,
>>
>> I've found a number of posts in different places talking about how to
>> perform decompounding, but I haven't found too many discussing how  
>> to use
>> the results of decompounding. If anyone can answer this question or  
>> point me
>> to an existing discussion it would be very helpful.
>>
>> In the description of the org.apache.lucene.analysis.compound  
>> package, it
>> gives the following example:
>>
>>       Rindfleischüberwachungsgesetz, 0, 29
>>       Rind, 0, 4, posIncr=0
>>       fleisch, 4, 11, posIncr=0
>>       überwachung, 11, 22, posIncr=0
>>       gesetz, 23, 29, posIncr=0
>>
>> And I see how this allows me to find single components such as  
>> "gesetz" or
>> "Rind". But what if I want to find combinations of components such as
>> "Rindfleisch" or "überwachungsgesetz"? It seems that the pattern of  
>> using
>> posIncr=0 for all components excludes the possibility of finding  
>> sub-strings
>> that are made up of multiple components.
>>
>> Any comments or thoughts would be appreciated.
>>
>> Ben Douglas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com


Mime
View raw message