Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 75970 invoked from network); 18 Jul 2005 21:53:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Jul 2005 21:53:45 -0000 Received: (qmail 64643 invoked by uid 500); 18 Jul 2005 21:53:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 64428 invoked by uid 500); 18 Jul 2005 21:53:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 64415 invoked by uid 99); 18 Jul 2005 21:53:39 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2005 14:53:39 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [195.92.193.19] (HELO cmailm3.svr.pol.co.uk) (195.92.193.19) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2005 14:53:34 -0700 Received: from user-2944.l2.c2.dsl.pol.co.uk ([81.77.107.128] helo=[192.168.1.2]) by cmailm3.svr.pol.co.uk with esmtp (Exim 4.41) id 1DudYN-0002oD-FU for java-user@lucene.apache.org; Mon, 18 Jul 2005 22:53:35 +0100 From: Andy Roberts To: java-user@lucene.apache.org Subject: Re: n-gram indexing Date: Mon, 18 Jul 2005 22:55:46 +0000 User-Agent: KMail/1.8.1 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200507182255.46464.mail@andy-roberts.net> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Monday 18 Jul 2005 21:27, Rajesh Munavalli wrote: > At what point do I add n-grams? Does the order in which I add n-grams > affect exact phrase queries later? My questions are > > (1) Should I add all the 1-grams followed by 2-grams followed by > 3-grams..etc sentence by sentence OR > > (2) Add all the 1 grams of entire document first before starting 2-grams > for the entire document? > > What is the general accepted notion of adding n-grams of a document? > > thanks, > > Rajesh I can't see any real advantage of storing n-grams explicitly. Just index the document and use phrase queries. Order is significant with phrase queries if I recall correctly, although you can use SpanNearQueries to look for unordered ngrams, although I don't know why you would want to! Perhaps if you explain a little more about what you are trying to achieve more generally, we can confirm that you don't need to mess with explicit indexing of indexing. Andy --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org