lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss DEV <clemens...@mysign.ch>
Subject AW: fuzzy/case insensitive AnalyzingSuggester )
Date Sat, 24 Jan 2015 13:50:34 GMT
I am back on this topic ;)

>Case- and diacritics insensitivity is supported out-of-the-box by the 
>analyzing suggesters, including the FuzzySuggester. 
>The logic is in the Analyzer.
So how do I force case-insensitivity?
I tried
...
	        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FuzzyLookupFactory</str>
	        <str name="ignoreCase=">true</str>
...
or
...
	        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory</str>
	        <str name="ignoreCase=">true</str>
...
to no avail

-----Ursprüngliche Nachricht-----
Von: Oliver Christ [mailto:ochrist@EBSCO.COM] 
Gesendet: Freitag, 20. Juni 2014 15:52
An: java-user@lucene.apache.org
Betreff: RE: fuzzy/case insensitive AnalyzingSuggester )

Hi Clemens,

I haven't yet built a suggester which combines all three, and am not aware of one. I'd love
to have one though ;-)

Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters,
including the FuzzySuggester. The logic is in the Analyzer.

I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated whether it's possible
to combine that with FuzzySuggester (which also is an analyzing suggester).

Due to memory constraints, we build infix suggesters by adding each relevant substring, but
use WFST suggesters with payloads as the base, to reduce RAM load at runtime. We call the
analyzer in the dictionary iterator. At search time, we look up the surface form (completion)
in a secondary index using the payload as a key (and for deduping).

If FuzzySuggester supports payloads (haven't checked), you could get an infix suggester using
the same approach. That will lead to large automata, and as you'd have to look up the completion
in a secondary index, you'd never use the surface form returned by the automaton itself, so
it's a waste of space. WFSTs are more space-efficient but don't support payloads (if I remember
correctly) and there's no fuzzy WFST suggester either :(

Generally, we found it beneficial to not combine all functionality in a single suggester,
but use separate automata in a cascaded model. We first look up completions in the prefix
non-fuzzy suggester. Based on several criteria, we may then consult the infix suggester, and
if needed, the fuzzy suggester. The rationale is that we don't want high-ranking fuzzy or
infix hits to fill up the completion list while there are good (but less popular) prefix hits.
Having control over which suggester is used when, and how its specific suggestions are merged
into the final result list, helps improving the user experience, at least with our use cases.

Cheers, Oli

-----Original Message-----
From: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Sent: Friday, June 20, 2014 6:47 AM
To: java-user@lucene.apache.org
Subject: AW: fuzzy/case insensitive AnalyzingSuggester )

Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to "hand" my big IndexReader (see oher post " [lucene 4.6] NPE when calling
IndexReader#openIfChanged") into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes (by swapping
 out part of the "lookup-table")?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�KK[XZ[
��]�K]\�\�][��X��ܚX�PX�[�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[
��]�K]\�\�Z[X�[�K�\X�K�ܙ�B�B
Mime
View raw message