lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Finotti Simone <tech...@yoox.com>
Subject Re: Skip first word
Date Fri, 27 Jul 2012 09:46:47 GMT
Brilliant!
Thank you very much :)

________________________________________
Inizio: Chantal Ackermann [c.ackermann@it-agenten.com]
Inviato: venerdì 27 luglio 2012 11.20
Fine: solr-user@lucene.apache.org
Oggetto: Re: Skip first word

Hi Simone,

no I meant that you populate the two fields with the same input - best done via copyField
directive.

The first field will contain ngrams of size 1 and 2. The other field will contain ngrams of
size 3 and longer (you might want to set a decent maxsize there).

The query for the autocomplete list uses the first field when the input (typed in by the user)
is one or two characters long. Your example was: "D", "G", or than "Do" or "Ga". The result
would search only on the single token field that contains for the input "Dolce & Gabbana"
only the ngrams "D" and "Do". So, only the input "D" or "Do" would result in a hit on "Dolce
& Gabbana".
Once the user has typed in the third letter: "Dol" or "Gab", you query the second, more tokenized
field which would contain for "Dolce & Gabbana" the ngrams "Dol" "Dolc" "Dolce" "Gab"
"Gabb" "Gabba" etc.
Both inputs "Gab" and "Dol" would then return "Dolce & Gabbana".

1. First  field type:

<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="2" side="front"/>

2. Secong field type:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- maybe add WordDelimiter etc. -->
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="10" side="front"/>

3. field declarations:

<field name="short_prefix" type="short_ngram" … />
<field name="long_prefix" type="long_ngram" … />

<copyField source="short_prefix" dest="long_prefix" />


Chantal

Am 27.07.2012 um 11:05 schrieb Finotti Simone:

> Hi Chantal,
>
> if I understand correctly, this implies that I have to populate different fields according
to their lenght. Since I'm not aware of any logical condition you can apply to copyField directive,
it means that this logic has to be implementend by the process that populates the Solr core.
Is this assumption correct?
>
> That's kind of bad, because I'd like to have this kind of "rules" in the Solr configuration.
Of course, if that's the only way... :)
>
> Thank you
>
> ________________________________________
> Inizio: Chantal Ackermann [c.ackermann@it-agenten.com]
> Inviato: giovedì 26 luglio 2012 18.32
> Fine: solr-user@lucene.apache.org
> Oggetto: Re: Skip first word
>
> Hi,
>
> use two fields:
> 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of
length < 3,
> 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs
>
>
> Cheers,
> Chantal
>
>
> Am 26.07.2012 um 09:05 schrieb Finotti Simone:
>
>> Hi Ahmet,
>> business asked me to apply EdgeNGram with minGramSize=1 on the first term and with
minGramSize=3 on the latter terms.
>>
>> We are developing a search suggestion mechanism, the idea is that if the user types
"D", the engine should suggest "Dolce & Gabbana", but if we type "G", it should suggest
other brands. Only if users type "Gab" it should suggest "Dolce & Gabbana".
>>
>> Thanks
>> S
>> ________________________________________
>> Inizio: Ahmet Arslan [iorixxx@yahoo.com]
>> Inviato: mercoledì 25 luglio 2012 18.10
>> Fine: solr-user@lucene.apache.org
>> Oggetto: Re: Skip first word
>>
>>> is there a tokenizer and/or a combination of filter to
>>> remove the first term from a field?
>>>
>>> For example:
>>> The quick brown fox
>>>
>>> should be tokenized as:
>>> quick
>>> brown
>>> fox
>>
>> There is no such filter that i know of. Though, you can implement one with modifying
source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity,
what is the use case for this?
>>
>>
>>
>>
>
>
>
>
>






Mime
View raw message