lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Using org.apache.lucene.analysis.compound
Date Wed, 21 Oct 2009 02:00:06 GMT
hi, it will work because it will also decompound "Rindfleish" into Rind and
fleish, with posIncr=0

so if you index Rindfleischüberwachungsgesetz, then query with "Rindfleish",
its matching because Rindfleish also gets decompounded into Rind and fleish.

On Tue, Oct 20, 2009 at 8:35 PM, Benjamin Douglas
<bbdouglas@basistech.com>wrote:

> Hello,
>
> I've found a number of posts in different places talking about how to
> perform decompounding, but I haven't found too many discussing how to use
> the results of decompounding. If anyone can answer this question or point me
> to an existing discussion it would be very helpful.
>
> In the description of the org.apache.lucene.analysis.compound package, it
> gives the following example:
>
>        Rindfleischüberwachungsgesetz, 0, 29
>        Rind, 0, 4, posIncr=0
>        fleisch, 4, 11, posIncr=0
>        überwachung, 11, 22, posIncr=0
>        gesetz, 23, 29, posIncr=0
>
> And I see how this allows me to find single components such as "gesetz" or
> "Rind". But what if I want to find combinations of components such as
> "Rindfleisch" or "überwachungsgesetz"? It seems that the pattern of using
> posIncr=0 for all components excludes the possibility of finding sub-strings
> that are made up of multiple components.
>
> Any comments or thoughts would be appreciated.
>
> Ben Douglas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message