From java-user-return-38806-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Mon Mar 02 17:26:55 2009 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 70376 invoked from network); 2 Mar 2009 17:26:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Mar 2009 17:26:55 -0000 Received: (qmail 75603 invoked by uid 500); 2 Mar 2009 17:26:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 75582 invoked by uid 500); 2 Mar 2009 17:26:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 75571 invoked by uid 99); 2 Mar 2009 17:26:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2009 09:26:47 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of raymond.balmes@gmail.com designates 209.85.218.167 as permitted sender) Received: from [209.85.218.167] (HELO mail-bw0-f167.google.com) (209.85.218.167) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2009 17:26:40 +0000 Received: by bwz11 with SMTP id 11so2463472bwz.5 for ; Mon, 02 Mar 2009 09:26:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=CmLimKw5el9Us4q+AEr5yefAsa/ObS7ECKw1GRvKXfY=; b=dEYpR5T/4peqqQSdJ6OrYVWU9FhpPn8gLlr7yTO45UJZgfc/3mcxoutfB72lhpHJRG pw4AYWzqAOZ2tk8W2luwEGixjmhNdxwxQIlJosmFL5U2ePkIToLwb03DjMnFo/6/s7UV /XvdUqr+IiBpKAv0+j67ajO3b3P6AwVM2E50o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=gzDEvh6PTCfycEJYqwNlyGl+NHhvhXDkAPu9ZNDz97JzRyHzcGD3i7irpaEtSUcIL5 SYMryhyiKkJujX4ChNqni/48C188S6MKvmu1Hxy1tAzB9uSVNgeGO8LeW9I4BWBHFda0 uSOUjYv91WgPj816ZQo3tzpYJyRPGYcuSWdds= MIME-Version: 1.0 Received: by 10.103.160.10 with SMTP id m10mr3053515muo.50.1236014779124; Mon, 02 Mar 2009 09:26:19 -0800 (PST) In-Reply-To: <2D127F11DC79714E9B6A43AC9458147F13FCA241@suex07-mbx-03.ad.syr.edu> References: <4014d98b0903010404s15682e6avb6c9d54dafd65870@mail.gmail.com> <2D127F11DC79714E9B6A43AC9458147F13FCA236@suex07-mbx-03.ad.syr.edu> <4014d98b0903020708mf190feep702bbbafdc2fbea0@mail.gmail.com> <2D127F11DC79714E9B6A43AC9458147F13FCA241@suex07-mbx-03.ad.syr.edu> Date: Mon, 2 Mar 2009 18:26:18 +0100 Message-ID: <4014d98b0903020926k77fcab47m7db6e94445d1f005@mail.gmail.com> Subject: Re: N-grams with numbers and Shinglefilters From: =?ISO-8859-1?Q?Raymond_Balm=E8s?= To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e65bc1ced8ce710464261d0a X-Virus-Checked: Checked by ClamAV on apache.org --0016e65bc1ced8ce710464261d0a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yes, I don't need a ShingleFilter I understand it by now. Yes I will have many of these phrases in the documents... this is why I thought I shouldn't use Lucene fields. I will investigate further your keyword approach sounds like possible, thx for the tip. However I presume I may need to normalize the phrases for the search phase, so it may not work. Keep in touch, -RB- On Mon, Mar 2, 2009 at 5:23 PM, Steven A Rowe wrote: > Hi Raymond, > > On 3/2/2009 at 10:09 AM, Raymond Balm=E8s wrote: > > suppose I have a tri-gram, what I want to do is index the tri-gram > > "string digit1 digit2" as one indexing phrase, and not index each token > > separately. > > As long as you don't want any transformation performed on the phrase or i= ts > components, you can add your phrase as a "keyword", i.e. a non-analyzed > string that will be indexed as-is. > > Unless your phrase field will be the only field on this document (pretty > unlikely), you'll want to use PerFieldAnalyzerWrapper[1] over > KeywordAnalyzer[2] for the phrase field, and whatever other analyzer you > like for the other document field(s). > > AFAICT, you don't need ShingleFilter. > > Steve > > [1] PerFieldAnalyzerWrapper: > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/PerFie= ldAnalyzerWrapper.html > [2] KeywordAnalyzer: > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/Keywor= dAnalyzer.html > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e65bc1ced8ce710464261d0a--