Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63036 invoked from network); 16 Jan 2009 13:19:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Jan 2009 13:19:59 -0000 Received: (qmail 79070 invoked by uid 500); 16 Jan 2009 13:19:52 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79049 invoked by uid 500); 16 Jan 2009 13:19:52 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 85936 invoked by uid 99); 16 Jan 2009 09:11:14 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Date: Fri, 16 Jan 2009 10:10:42 +0100 From: =?iso-8859-1?Q?Asbj=F8rn_A=2E?= Fellinghaug To: java-user@lucene.apache.org Subject: Re: Google finance-like suggestible search field Message-ID: <20090116091042.GF29846@stud.ntnu.no> References: <5904C6EA1AC43B418E30B3D6FA2647520F0EB9B7@MSGMMKCLF2WIN.DMN1.FMR.COM> <239f2f640901141823n6fff0166t91b3db28358c75cf@mail.gmail.com> <5904C6EA1AC43B418E30B3D6FA2647520D2EC294@MSGMMKCLF2WIN.DMN1.FMR.COM> <359a92830901141857ra0371aai96d1049cd03ed559@mail.gmail.com> <20090115082443.GA29846@stud.ntnu.no> <5904C6EA1AC43B418E30B3D6FA2647520F0EB9D1@MSGMMKCLF2WIN.DMN1.FMR.COM> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5904C6EA1AC43B418E30B3D6FA2647520F0EB9D1@MSGMMKCLF2WIN.DMN1.FMR.COM> User-Agent: Mutt/1.5.11 X-Virus-Scanned: Debian amavisd-new at bene1.itea.ntnu.no X-Virus-Checked: Checked by ClamAV on apache.org Hi again. You can find additional info regarding this Bigram index here: http://asbjorn.fellinghaug.com/blog/master-thesis/ The source code was available, from the same site but it has disappeared. However, it can be downloaded from the computer science department at NTNU in Norway: http://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429 Hope this helps. Hayes, Peter: > Thanks for your input. I will try and apply your suggestion. > > Thanks, > Peter > > -----Original Message----- > From: Asbj�rn A. Fellinghaug [mailto:asbjorn@fellinghaug.com] > Sent: Thursday, January 15, 2009 3:25 AM > To: java-user@lucene.apache.org > Subject: Re: Google finance-like suggestible search field > > > Hi. > > Such 'autocompletion' features with Lucene could be provided with n-gram > tokenizers, as Erick states. I made a 'Bigram' analyzer for my master > thesis, when I was doing some research on how to enhance phrase > searching. This Analyzer considers pair of words as single terms. > > Basically, what the Bigram analyzer does is to index stopwords combined > with the "previous" word, and with the "next" word. Single stopwords > would not be indexed, as they demand a lot of resources during searches. > Only combination of prev+stopword and stopword+nextword would be > indexed. This saves a lot during searching. > > Consider this sentence: "fetch me a beer honey" (where 'a' and 'me' is > stopwords). The Bigram analyzer would index these 'Tokens': > 'fetch', 'fetch me', 'me a', 'a beer', 'honey'. > > Erick Erickson: > > You could look at the n-gram tokenizers (I confess I haven't used them > > so I'm not all *that* familiar with them). Or you could make a rule like > > "no autocomplete until the user types 3 characters" if that would work. > > > > Instead of forming a query, you might try using TermEnum, or > > WildCardTermEnum > > or even RegexTermEnum to quickly get the list of terms for your > > autocomplete. The > > nice part about this approach is that you could quit after a suitable number > > of > > terms were found rather than get them all. As I remember, WildCardTermEnum > > is > > faster than RegexTermEnum, but don't hold me to that. So I'd try > > WildCardTermEnum > > first, I think you'll find it much more suitable than forming > > > > Best > > Erick > > -- > Asbj�rn A. Fellinghaug > asbjorn@fellinghaug.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Asbj�rn A. Fellinghaug asbjorn@fellinghaug.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org