Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76222 invoked from network); 29 Aug 2006 20:58:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 29 Aug 2006 20:58:17 -0000 Received: (qmail 24036 invoked by uid 500); 29 Aug 2006 20:58:08 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 24008 invoked by uid 500); 29 Aug 2006 20:58:08 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 23997 invoked by uid 99); 29 Aug 2006 20:58:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Aug 2006 13:58:08 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of erickerickson@gmail.com designates 64.233.166.176 as permitted sender) Received: from [64.233.166.176] (HELO py-out-1112.google.com) (64.233.166.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Aug 2006 13:58:05 -0700 Received: by py-out-1112.google.com with SMTP id w49so2525231pyg for ; Tue, 29 Aug 2006 13:57:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=BtTXEJIXiNvE/NES+7nkjxN5Dg5+ylVWYN90U/vIGbO18lWG2ok+vrFmEQqbdI3JUvqSJw0uNPtbmlzZnMFJVhhUPuwkskhwAWxUtN+ZuJ3jcbZBNQJbvblO56LvsXWLHOJV9ZNXvPNHEEpMXjXWtA7evDY9mDZ/8+KQLE54buM= Received: by 10.35.100.6 with SMTP id c6mr21146pym; Tue, 29 Aug 2006 13:57:44 -0700 (PDT) Received: by 10.35.9.18 with HTTP; Tue, 29 Aug 2006 13:57:44 -0700 (PDT) Message-ID: <359a92830608291357x1355af16p2a1f129cbce12add@mail.gmail.com> Date: Tue, 29 Aug 2006 16:57:44 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Installing a custom tokenizer In-Reply-To: <617d2b451b3f7d8e9530363d5b4f4474@as-st.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_17217_26657837.1156885064463" References: <33432A11DBA32B4EACBF6E37C5671DFA04F4EF@mailhyd2.hyd.deshaw.com> <1156867987.44f4679350377@webmail.uu.se> <617d2b451b3f7d8e9530363d5b4f4474@as-st.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_17217_26657837.1156885064463 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Tucked away in the contrib section of Lucene (I'm using 2.0) there is.... org.apache.lucene.index.memory.PatternAnalyzer which takes a regular expression as and tokenizes with it. Would that help? Word of warning... the regex determines what is NOT a token, not what IS a token (as I remember), which threw me for a bit. Don't know if this is really useful, but it might work for you without as much work... Best Erick@I'mNowBeyondMyCompetence.WhyDoTheyStillEmployMeHere? On 8/29/06, Bill Taylor wrote: > > > On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote: > > > > > : Have a look at PerFieldAnalyzerWrapper: > > > > : > > http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ > > PerFieldAnalyzerWrapper.html > > > > ...which can be specified in the constructors for IndexWriter and > > QueryParser. > > As I understand it, this allows me to specify a different analyzer for > each field name. My problem is that the standard analyzer will not > work for my content field and I need to define a new one. I need to > make a modification to the StandardTokenizer so that a number does not > need to have a digit in every other segment of a part number. > > For example, the StandardTokenizer breaks aa-bb-2 on the - between aa > and bb because it demands that every other string between a - have a > digit. > > I need to modify the .jj file for the Standard Tokenizer and get a new > one, but I am confused by the javaCC documentation and do not know how > to run it to get what I need. > > Thanks for the help. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_17217_26657837.1156885064463--