Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 36716 invoked from network); 2 Mar 2009 16:24:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Mar 2009 16:24:00 -0000 Received: (qmail 50104 invoked by uid 500); 2 Mar 2009 16:23:53 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50075 invoked by uid 500); 2 Mar 2009 16:23:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50064 invoked by uid 99); 2 Mar 2009 16:23:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2009 08:23:53 -0800 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.230.1.71] (HELO mx1.syr.edu) (128.230.1.71) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2009 16:23:43 +0000 Received: from suex07-hub-02.ad.syr.edu (suex07-hub-02.ad.syr.edu [128.230.108.196]) by mx1.syr.edu (8.14.3/8.14.3) with ESMTP id n22GNLGF021953 for ; Mon, 2 Mar 2009 11:23:21 -0500 Received: from suex07-mbx-03.ad.syr.edu ([128.230.108.133]) by suex07-hub-02.ad.syr.edu ([fe80::813b:49a2:a4d5:6367%10]) with mapi; Mon, 2 Mar 2009 11:23:21 -0500 From: Steven A Rowe To: "java-user@lucene.apache.org" Date: Mon, 2 Mar 2009 11:23:21 -0500 Subject: RE: N-grams with numbers and Shinglefilters Thread-Topic: N-grams with numbers and Shinglefilters Thread-Index: AcmbSOIfWLXOcVTRT7aSDfCyqdT/SgABy+Iw Message-ID: <2D127F11DC79714E9B6A43AC9458147F13FCA241@suex07-mbx-03.ad.syr.edu> References: <4014d98b0903010404s15682e6avb6c9d54dafd65870@mail.gmail.com> <2D127F11DC79714E9B6A43AC9458147F13FCA236@suex07-mbx-03.ad.syr.edu> <4014d98b0903020708mf190feep702bbbafdc2fbea0@mail.gmail.com> In-Reply-To: <4014d98b0903020708mf190feep702bbbafdc2fbea0@mail.gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7400:2.4.4,1.2.40,4.0.166 definitions=2009-03-02_09:2009-02-26,2009-03-02,2009-03-02 signatures=0 X-Proofpoint-Spam-Reason: safe X-Virus-Checked: Checked by ClamAV on apache.org Hi Raymond, On 3/2/2009 at 10:09 AM, Raymond Balm=E8s wrote: > suppose I have a tri-gram, what I want to do is index the tri-gram > "string digit1 digit2" as one indexing phrase, and not index each token > separately. As long as you don't want any transformation performed on the phrase or its= components, you can add your phrase as a "keyword", i.e. a non-analyzed st= ring that will be indexed as-is. Unless your phrase field will be the only field on this document (pretty un= likely), you'll want to use PerFieldAnalyzerWrapper[1] over KeywordAnalyzer= [2] for the phrase field, and whatever other analyzer you like for the othe= r document field(s). AFAICT, you don't need ShingleFilter. Steve [1] PerFieldAnalyzerWrapper: http://lucene.apache.org/java/2_4_0/api/org/a= pache/lucene/analysis/PerFieldAnalyzerWrapper.html [2] KeywordAnalyzer: http://lucene.apache.org/java/2_4_0/api/org/apache/luc= ene/analysis/KeywordAnalyzer.html --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org