lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject Re: Suggestive Search
Date Wed, 08 Apr 2009 22:34:17 GMT
If you use prefix grams only then you'll get a forward-only suggestion  
scheme. I've seen several implementation that use that and it works  
quite well.

harry potter: ^ha, ^har, ^harr, ^harry, ^harry p, ^harry po..
harry houdini: ^ha, ^har, ^harr, ^harry, ^harry h, ^harry ho..

I prefere the trie-pattern though. Just rememberd there is an old one  
in LUCENE-625.


8 apr 2009 kl. 20.50 skrev Matt Schraeder:

> Corerct me if I'm wrong, but I don't think n-grams is really what I'm
> looking for here.  I'm not looking for a spellchecker or phrase  
> checker
> style suggestive search, but only based on the exact phrases the  
> user is
> currently typing.  Since Lucene uses term-based searching, I'm not  
> sure
> how to have it search on portions of a full phrase.  Using a standard
> lucene search typing in "harr" will result in searching for "harr"  
> as a
> term, which will not find "Harry Potter".  Using ngrams it would find
> "Harry" as a term, but not at the beginning of an entire phrase.  This
> would bring back "My Dog Harry" as a result, which isn't what I'm
> looking for. I just want phrases from fields beginning with "Harr"
> only.
> I could easily do this all with our database server by simply doing a
> query for "where searchqueries like 'harr%'" but we're trying to limit
> our hits to the database to keep speed up on the site.
>>>> 4/8/2009 12:49:45 PM >>>
> For this you probably want to use ngrams. Wether or not this is
> something that fits in your current index is hard to say. My guess is
> that you want to create a new index with one document per unique
> phrase. You might also want to try to load this index in an
> InstantiatedIndex, that could speed things up quite a bit if the
> corpus is not too large.
> If your suggestion text corpus is really large and you only want
> forward-only suggestions then you might want to consider a trie-
> pattern solution instead. These can be rather resource efficient, even
> when loaded to memory.
> If you have a lot of user load on your search eninge then it might be
> interesting to use old user queries as the base of your suggestions
> and perhaps boost a bit on trends, i.e. the more people search for
> something the more it get boosted in the suggestions list.
>      karl
> 8 apr 2009 kl. 15.26 skrev Matt Schraeder:
>> I want to add a suggestive search similar to google's to
> autocomplete
>> search phrases as the user types.  It doesn't have to be very
>> elaborate
>> and for the most part will just involve searching single fields.
> How
>> can I perform a search  to be able to fill in autocomplete text?
>> For instance, if I start typing "Harr" it should bring up "Harry
>> Potter" "Harry Houdini" and "Harry S. Truman"
>> I have tried doing search queries for "Harr*" but it's still doing
>> term-based searching rather than searching a full field.  To make a
>> field both searchable as the full field as well as tokenized, would
> I
>> have to duplicate the field and make one a keyword field? Is there a
>> more convenient way to do this? I have also considered making a
> second
>> index for suggestive search, which would only have the fields that I
>> want to enable suggestive search on, but this seems like it would be
>> unneccesary duplication of data as well, though it would probably
> make
>> suggestive search faster due to a smaller index.
>> Ideally it would also be nice to be able to rank these terms based
> on
>> the number of times they have been searched for so that the results
>> are
>> tailored more to our users rather than simply just the score that
>> Lucene
>> chooses.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message