Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 9862 invoked from network); 21 Jul 2006 20:09:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 21 Jul 2006 20:09:59 -0000 Received: (qmail 9700 invoked by uid 500); 21 Jul 2006 20:09:53 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 9656 invoked by uid 500); 21 Jul 2006 20:09:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 9645 invoked by uid 99); 21 Jul 2006 20:09:53 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jul 2006 13:09:53 -0700 X-ASF-Spam-Status: No, hits=0.8 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE,MAILTO_TO_SPAM_ADDR,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of markrmiller@gmail.com designates 66.249.92.170 as permitted sender) Received: from [66.249.92.170] (HELO ug-out-1314.google.com) (66.249.92.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Jul 2006 13:09:52 -0700 Received: by ug-out-1314.google.com with SMTP id y2so1339918uge for ; Fri, 21 Jul 2006 13:09:31 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=Pe3UJ7vNONqWzXLAPmToP32R7IEQTgnMdoVrujCbJ+lWgYR6ccO0RsSjoGXEZBkS1rhqEEW0RIZGYsbdYxT2i39aF73lB6Rh7YRJxntj74X5yal4GaVpqQrwKFu1wp7EJO9TICAXuDr5UQ48vMpCjb+zCrd5jm5i6NFchLOwgRw= Received: by 10.67.29.12 with SMTP id g12mr1026018ugj; Fri, 21 Jul 2006 13:09:30 -0700 (PDT) Received: by 10.67.90.14 with HTTP; Fri, 21 Jul 2006 13:09:28 -0700 (PDT) Message-ID: Date: Fri, 21 Jul 2006 16:09:28 -0400 From: "Mark Miller" To: java-user@lucene.apache.org Subject: Re: StandardAnalyzer question In-Reply-To: <51540A3DDD507D40B6D47030C8C0C19101A3A1F1@soumaiexcp01.iss.net> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_9_20444770.1153512568127" References: <51540A3DDD507D40B6D47030C8C0C19101A3A1F1@soumaiexcp01.iss.net> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_9_20444770.1153512568127 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline | < #LETTER: // unicode letters [ "\u0041"-"\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", "\u0100"-"\u1fff" ] becomes | < #LETTER: // unicode letters [ "\u0041"-"\u005a", "\u0061"-"\u007a", "\u00c0"-"\u00d6", "\u00d8"-"\u00f6", "\u00f8"-"\u00ff", "\u0100"-"\u1fff", "\u002d" ] On 7/21/06, Ngo, Anh (ISS Southfield) wrote: > > > Hello Mark, > > > Please show me how to add "-" to #LETTER definition > > > Thanks, > > > Anh Ngo > > -----Original Message----- > From: Mark Miller [mailto:markrmiller@gmail.com] > Sent: Friday, July 21, 2006 3:51 PM > To: java-user@lucene.apache.org > Subject: Re: StandardAnalyzer question > > I do not beleive so. If you look above you will see that #P is only used > when looking for a num: a host ip, a phone number, etc. You will be > removing > that ability to recognize a "_" while rooting those tokens out. It will > still be parsed when tokenizing an EMAIL as well. I dont think this is > the > behavior you want. > > - Mark > > On 7/21/06, Ngo, Anh (ISS Southfield) wrote: > > > > > > What is #LETTER definition in SnardarTokernize.jj? > > > > > > I saw: > > > > | <#P: ("_"|"-"|"/"|"."|",") > > > | <#HAS_DIGIT: // at least one > digit > > (|)* > > > > (|)* > > > > > > > > > Should I remove "_" and recompile the source code? > > > > Sincerely, > > > > > > Anh Ngo > > > > -----Original Message----- > > From: Daniel Naber [mailto:lucenelist2005@danielnaber.de] > > Sent: Friday, July 21, 2006 2:49 PM > > To: java-user@lucene.apache.org > > Subject: Re: StandardAnalyzer question > > > > On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote: > > > > > The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as > a > > > token. Is there a way I can make StandardAnalyzer don't tokenize for > > > "_" or any given characters? > > > > You need to add "_" to the #LETTER definition in StandardTokenizer.jj, > > then > > rebuild StandardTokenizer.java using the appropriate and task. > > > > Regards > > Daniel > > > > -- > > http://www.danielnaber.de > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_9_20444770.1153512568127--