lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Christ <ochr...@EBSCO.COM>
Subject RE: real infix suggester, not AnalyzingInfixSuggester
Date Mon, 27 Oct 2014 11:47:20 GMT
The hard way may be to use the standard Analyzing Suggester but to add each (analyzed) suffix
of the surface string (mapping to the full surface form) during automaton generation. 

I.e. when adding "Donau...", you add all analyzed suffixes "donau...", "onau...", "nau...",
... - all mapping to "Donau...", with identical rank. 

I think on equal inputs, the rank of the last one added wins, but I'm not sure.

You may "drown" in unspecific suggestions at least for short inputs, and the automata will
get large. But it should give you a suggester you can play around with to evaluate whether
you need decompounding (you probably do).

Cheers, Oli

-----Original Message-----
From: Michael Sokolov [mailto:msokolov@safaribooksonline.com] 
Sent: Monday, October 27, 2014 7:23 AM
To: java-user@lucene.apache.org
Subject: Re: real infix suggester, not AnalyzingInfixSuggester

Have you considered combining the AnalyzingInfixSuggester with a German decompounding filter?
 If you break compound words into their constituent parts during analysis, then the suggester
will be able to do what you want (prefix matches on the word-parts).  I found this project
with a quick google search: 
https://github.com/jprante/elasticsearch-analysis-decompound; I don't know how good it is
or whether it fits with your environment, but it could be a start.

-Mike

On 10/27/14 6:34 AM, Michael Breu wrote:
> Hello,
>
> I'm looking for an infix suggester that allows infix search for a given
> term. This might not be that important in English.
> However in German we have quite complex composite words like
>      Donaudampfschifffahrtsgesellschaftskapitän
> which is composed by the nouns Donau (danube), Dampf (steam), schiff
> (boat), etc.
>
> So I would like to support searches like *schiff* to suggest
> Donaudampfschifffahrtsgesellschaft.
>
> I have mistakenly tried for the AnalyzingInfixSuggester, however this
> does not do what I expect, because it does prefix matches to tokens, but
> no infix matches.
>
> I tried to adapt the AnalyzingSuggester, however it seemed to complex
> for an easy conversion to an infix suggester.
>
> I know that this was already asked by
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201103.mbox/%3C1301054307585-2729996.post@n3.nabble.com%3E,
> however, nobody answered this post as far as I know.
>
> Thank you for your help
>
> Wallenstein
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message