lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Johansson <mag...@technohuman.com>
Subject Re: Analyzing and Querying
Date Fri, 06 Aug 2004 11:28:14 GMT
You could create a custom analyzer that splits compound words into its
parts. That is applying the analyzer to the word "bergbahn" would yield
the terms "berg" and "bahn"

Splitting compound words can be done quite effectively simply by using
a large wordlist. I have done this for swedish.

/magnus


Tino Schöllhorn wrote:

> Hi,
>
> I have a problem which I'd like to understand - and perhaps it is also 
> possible to solve it ;-).
>
> I built an index using Lucene with the GermanAnalyzer. Now I have the 
> following phenomenon:
>
> - when searching for "bahn" the result contains hardly any "bergbahn"
>
> I am aware that the Lucene Query-Api supports wildcards, but as far as 
> I know I cannot add a * in front of a query-term.
>
> Do you have any suggestions how I could find "bergbahn" with the query 
> "bahn"? (this applies to other compound words as well).
>
> With regards
> Tino
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message