Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 60460 invoked from network); 3 Jun 2009 19:10:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jun 2009 19:10:42 -0000 Received: (qmail 10602 invoked by uid 500); 3 Jun 2009 19:10:53 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 10505 invoked by uid 500); 3 Jun 2009 19:10:53 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 10497 invoked by uid 99); 3 Jun 2009 19:10:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 19:10:53 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 19:10:43 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1MBvqw-0004zi-QK for java-dev@lucene.apache.org; Wed, 03 Jun 2009 12:10:22 -0700 Message-ID: <23857450.post@talk.nabble.com> Date: Wed, 3 Jun 2009 12:10:22 -0700 (PDT) From: ami dudu To: java-dev@lucene.apache.org Subject: Re: Enhance StandardTokenizer to support words which will not be tokenized In-Reply-To: <2D84B445-07F4-4232-A40B-8AFCAAB4B5B9@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: amidudu1@gmail.com References: <23849495.post@talk.nabble.com> <2D84B445-07F4-4232-A40B-8AFCAAB4B5B9@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org This can be good solution but it will have to be maintained every update of the StandardAnalyzer rules. Is there a way to workaround it? Grant Ingersoll-6 wrote: > > You'd have to modify the JFlex grammar. I'd suggest adding in a > generic "protected words" approach whereby you can pass in a list of > protected words. > > This would be a nice patch/improvement. > > -Grant > > On Jun 3, 2009, at 4:07 AM, ami dudu wrote: > >> >> Hi, I'm using a StandardTokenizer which do great job for me but i >> need to >> enhance it somehow to consider words like "c++" "c#", ".net" as is >> and not >> tokenized it into "c" or "net". >> I know that there are other tokenizers such as KeywordTokenizer and >> WhitespaceTokenizer but they do not include the StandardTokenizer >> logic. >> Any ideas on what is the best way to add this enhancement? >> >> Thanks, >> Amid >> -- >> View this message in context: >> http://www.nabble.com/Enhance-StandardTokenizer-to-support-words-which-will-not-be-tokenized-tp23849495p23849495.html >> Sent from the Lucene - Java Developer mailing list archive at >> Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Enhance-StandardTokenizer-to-support-words-which-will-not-be-tokenized-tp23849495p23857450.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org