lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Yang" <>
Subject RE: Autocomplete with Filter Query
Date Fri, 10 Sep 2010 17:00:33 GMT
Cool idea! 
I was suggesting a copy field because I want to provide autocomplete on
any field that the dismax can search on - eg if dismax searches both
name and phone, then when they start typing name or phone, I want it to
give autocompletion there 

So to get your idea clear are you suggesting a field like this:

<field name="AutoComplete" multivalued="true" type="myngramsplitter"/>
<copyField source="name" dest="Autocomplete"/>
<copyField source="phone" dest="Autocomplete"/>

And searching like this: 
solr/core/select?q=Autocomplete:(dog wal)&fq=UserSelectedFilter

On a related note: how do you deal with no exact ngram match, but some
relevant ngrams? E.g. user types "dog wam" and it finds no ngrams with
"dog wam" but there are ngrams for "dog wal" (for dog walking) - this is
probably not too relevant though since mostly prefix suggestion should
be enough.

-----Original Message-----
From: Jonathan Rochkind [] 
Sent: Friday, September 10, 2010 11:41 AM
Subject: RE: Autocomplete with Filter Query

I've been thinking about this too, and haven't come up with any GREAT
way. But there are several possible ways, that will do different things,
good or bad, depending on the nature of your data and exactly what you
want to do.  So here are some ideas I've been thinking about, but not a
ready made solution for you. 

One thing first, the statement about "copy field to copy all dismax
terms into one big field." doesn't exactly make sense. Copyfield is
something that happens at index time, whereas dismax is only something
that is used at query time.  Since it's only used at query time, just
because you are using dismax for your main search, doesn't mean you have
to use dismax for your autocomplete query.   The autocomplete query,
that returns the things you're going to display in your auto-complete
list, can be set up however you want.  (we are talking about an
auto-complete list, not a "Google Instant" style autocomplete, right?
The latter would introduce even more issues). 

So, do you want the autocomplete to only match on the _entire query_ as
entered, or do you want an autocomplete for each word?  For instance, if
I enter "dog walking", should the autocomplete be autocompleting "dog
walking" as a whole, or should it be autocompleting "walking" by the
time I've typed in "dog walking"?  It's easier to set up to autocomplete
on the whole phrase. 

Next, though, you probably want autocomplete to complete on partial
words, not just complete words. "Dog wal" should autocomplete to "dog
walking". That introduces an extra kink too. But let's assume we want

So one idea. At index time, populate a field that will be used
exclusively for auto-completing. Make this field actually
_non-tokenizing_, probably a Text type but with the KeywordTokenizer
(ie, the non-tokenizing tokenizer, heh).   So if you're indexing "dog
walking", then the token in the field is actually "dog walking", not
["dog","walking"].   Next, normalize it by removing punctuation (because
we probably don't want to consider punctuation for auto-completing), and
maybe normalizing whitespace by collapsing any adjacent whitespace to a
single space, and removing whitespace at beginning and end. So "   dog
walking   " will index as "dog walking". (This matters more at query
time then index time, but less confusing to do the same normalization at
both points).  That can be done with a charpatternfilter.  

But now we've also got to n-gram expand it.  So if the term being
indexed is "dog walking", we actually want to store ALL these terms in
the index:
"dog "
"dog w"
"dog wa"

Ie, n-grams, but only expanded out from the front.  I believe you can
use the EdgeNGramFilterFactory for this (at index time only, this one
you don't want in your query-time analyzers).  Although I haven't
actually tried the EdgeNGramFilterFactory with a non-tokenized field, I
think it should work. This will expand the size of your index, hopefully
not to a problematic degree. 

Now, to actually do the auto-complete. At query time, take the whole
thing the user has entered, and issue a query, with whatever fq's you
want too, but use the "field" type query parser (NOT "dismax" or
"lucene", because we don't want the query parser to pre-tokenize on
whitespace, but not "raw" because we DO want to go through the
query-time field analyzers), restricted to this autocomplete field
you've created. One way to do this is:  << q={!field
f=my_autocomplete_field}the user's query >> (url-encoded, naturally). 

That's pretty much it, I think that should work, depending on the
requirements of 'work'.  Although I haven't tried it yet. 

Now, if you want the user's query to auto-complete match in the middle
of your terms, things get a lot more complicated. Ie, if you want "walk"
to auto-complete to "dog walking" too.  This won't do that.  Also, if
you want some kind of stemming to happen in auto-complete, this won't do
that either. And also, if you want to auto-complete not the entire
phrase the user has typed in, but each white-space-seperated word as
they type it, this won't do THAT either.  Trying to get all those things
to work becomes even more complicated -- especially with the requirement
that you want to be able to apply the 'fq's from your current search
context to the auto-complete.  I haven't entirely thought through a
possible way to do all that. 

But hopefully this gives you some clues to think about it. 

From: David Yang []
Sent: Friday, September 10, 2010 11:14 AM
Subject: Autocomplete with Filter Query


Is there any way to provide autocomplete while filtering results?
Suppose I had a bunch of people and each person has multiple
occupations. When I select 'Assistant' in a filter box, it would be nice
if autocomplete only provides assistant names, instead of all names. The
other issue is that I use DisMax to do my search (name, title, phone
number etc) - so it might be more complex to do autocomplete. I could
have a copy field to copy all dismax terms into one big field.



View raw message