lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Autosuggest/Autocomplete with solr 1.4 and EdgeNGrams
Date Wed, 24 Feb 2010 15:35:48 GMT
You might also look at http://issues.apache.org/jira/browse/SOLR-1316

On Feb 24, 2010, at 1:17 AM, Sachin wrote:

> 
> 
> Hi All,
> 
> I am trying to setup autosuggest using solr 1.4 for my site and needed some pointers
on that. Basically, we provide autosuggest for user typed in characters in the searchbox.
The autosuggest index is created with older user typed in search queries which returned >
0 results. We do some lazy writing to store this information into the db and then export it
to solr on a nightly basis. As far as I know, there are 3 ways (apart from wild card search)
of achieving autosuggest using solr 1.4:
> 
> 1. Use EdgeNGrams
> 2. Use shingles and prefix query.
> 3. Use the new Terms component.
> 
> I am for now more inclinded towards using the EdgeNGrams (no method to madness) and just
wanted to know is there any recommended approach out of the 3 in terms of performance, since
the user excepts the suggestions to be almost instantaneous? We do some heavy caching at our
end to avoid hitting solr everytime but is any of these 3 approaches faster than the other?
> 
> Also, I would also like to return the suggestion even if the user typed in query matches
in between: for instance if I have the query "chicken pasta" in my index and the user types
in "pasta", I would also like this query to be returned as part of the suggestion (ala Yahoo!).
Below is my field definition:
> 
>        <fieldType name="suggest" class="solr.TextField" positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="50"
/>
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>            </analyzer>
>        </fieldType>
> 
> 
> I tried changing the KeywordTokenizerFactory with LetterTokenizerFactory, and though
it works great for the above scenario (does a in-between match), it has the side-effect of
removing everything which are not letters so if the user types in "123" he gets absolutely
no suggestions. Is there anything that I'm missing in my configuration, is this even achievable
by using EdgeNGrams or shall I look at using perhaps the TermsComponent after applying the
regex patch from 1.5 and maybe do something like ".*user-typed-in-chars.*"?
> 
> Thanks!
> 
> 
> 



Mime
View raw message