lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Douglas <>
Subject Using org.apache.lucene.analysis.compound
Date Wed, 21 Oct 2009 00:35:19 GMT

I've found a number of posts in different places talking about how to perform decompounding,
but I haven't found too many discussing how to use the results of decompounding. If anyone
can answer this question or point me to an existing discussion it would be very helpful.

In the description of the org.apache.lucene.analysis.compound package, it gives the following

	Rindfleischüberwachungsgesetz, 0, 29
	Rind, 0, 4, posIncr=0
	fleisch, 4, 11, posIncr=0
	überwachung, 11, 22, posIncr=0
	gesetz, 23, 29, posIncr=0

And I see how this allows me to find single components such as "gesetz" or "Rind". But what
if I want to find combinations of components such as "Rindfleisch" or "überwachungsgesetz"?
It seems that the pattern of using posIncr=0 for all components excludes the possibility of
finding sub-strings that are made up of multiple components.

Any comments or thoughts would be appreciated.

Ben Douglas

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message