lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Kahwe Smith <...@pooteeweet.org>
Subject Re: wildcards in stopword list
Date Wed, 03 Feb 2010 14:06:00 GMT

On 03.02.2010, at 14:34, Ahmet Arslan wrote:

> 
>> Actually I plan to write a bigger blog post about the
>> approach. In order to match the different fields I actually
>> have a separate core with an index dedicated to auto suggest
>> alone where I merge all fields together via some javascript
>> code:
>> 
>> This way I can then use terms for a single word entered and
>> a facet prefix search with the last term as the prefix and
>> the rest as the query for multi term entries into the auto
>> suggest box.
>> 
>> The idea is that I can then enter any part of any of the
>> fields, but I will then be suggested the entire phrase in
>> that field:
>> 
>> So if I have a field:
>> Foo Bar Ding Dong
>> 
>> and I enter "ding" into the search box, I would get a
>> suggestion of "Foo Bar Ding Dong"
> 
> If I am not wrong you have a list of suggestion candidates indexed in a separate core
dedicated to auto suggest alone.
> 
> I think you can use this field type for suggestion.

First up:
I very much appreciate your input!

> <fieldType name="prefix_token" class="solr.TextField" positionIncrementGap="1">
> <analyzer type="index">
>  <tokenizer class="solr.WhitespaceTokenizerFactory" /> 
>  <filter class="solr.LowerCaseFilterFactory" /> 
>  <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20" />

>  </analyzer>
> <analyzer type="query">
>  <tokenizer class="solr.WhitespaceTokenizerFactory" /> 
>  <filter class="solr.LowerCaseFilterFactory" /> 
>  </analyzer>
>  </fieldType>
> 
> With this field type, the query "ding" or "din" or "di" would return "Foo Bar Ding Dong".



hmm wouldnt it return "foo bar ding dong" ?
obviously i have to decide how important it is for me to get the original mixed case string
for auto suggest, but it does matter a bit more over here in Europe than in the US for example.

if i would both index the original mixed case and the lower case version and remove the solr.LowerCaseFilterFactory
in both analyzer sections, then it should work however as long as terms usually start with
an upper case letter if they do contain upper case letters.

let me try this out ..

regards,
Lukas Kahwe Smith
mls@pooteeweet.org




Mime
View raw message