Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 40364 invoked from network); 18 Jul 2005 20:40:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Jul 2005 20:40:11 -0000 Received: (qmail 61795 invoked by uid 500); 18 Jul 2005 20:40:11 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 61657 invoked by uid 500); 18 Jul 2005 20:40:10 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 61644 invoked by uid 99); 18 Jul 2005 20:40:10 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2005 13:40:10 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [65.84.113.130] (HELO mail.dessci.com) (65.84.113.130) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2005 13:40:05 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C58BD9.2085234C" Subject: N-gram Date: Mon, 18 Jul 2005 13:41:51 -0700 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: N-gram Thread-Index: AcWL2SAQA8q4sSP6Q6CzF98bOPUQ5Q== From: "Rajesh Munavalli" To: X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------_=_NextPart_001_01C58BD9.2085234C Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable At what point do I add n-grams? Does the order in which I add n-grams affect exact phrase queries later? My questions are =20 (1) Should I add all the 1-grams followed by 2-grams followed by 3-grams..etc sentence by sentence OR (2) Add all the 1 grams of entire document first before starting 2-grams for the entire document? =20 What is the general accepted notion of adding n-grams of a document? =20 thanks, =20 Rajesh ------_=_NextPart_001_01C58BD9.2085234C--