Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 322F87E93 for ; Tue, 29 Nov 2011 18:39:43 +0000 (UTC) Received: (qmail 3780 invoked by uid 500); 29 Nov 2011 18:39:40 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 3741 invoked by uid 500); 29 Nov 2011 18:39:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 3733 invoked by uid 99); 29 Nov 2011 18:39:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2011 18:39:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stephen.warner.thomas@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2011 18:39:34 +0000 Received: by faao14 with SMTP id o14so1611789faa.35 for ; Tue, 29 Nov 2011 10:39:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type :content-transfer-encoding; bh=Qo21VgNcdz/k8XvTPIbTYkm1Fl4GXrUYEBD8ksirq0A=; b=UJYJYocZYlAkEd6hwUgiOsPkE9hbGVV6/ZmGO7vrSTu2mnsD3YF/Xpy9eo0K6gFgBS fCiNK63ENFPMn7F8xKSUMNsh4v2STD6nKyKmeeiUtpFaKaJSsbve4bF9grMareEkjtAw sbc79ux/HNFuruwveu9Fi/DsC1KBlfz/8D9nI= Received: by 10.180.107.229 with SMTP id hf5mr49056627wib.35.1322591953631; Tue, 29 Nov 2011 10:39:13 -0800 (PST) MIME-Version: 1.0 Sender: stephen.warner.thomas@gmail.com Received: by 10.227.179.204 with HTTP; Tue, 29 Nov 2011 10:38:52 -0800 (PST) In-Reply-To: <005801ccaebe$7c140d90$743c28b0$@thetaphi.de> References: <005801ccaebe$7c140d90$743c28b0$@thetaphi.de> From: Stephen Thomas Date: Tue, 29 Nov 2011 13:38:52 -0500 X-Google-Sender-Auth: LY_5lPJ1ORRk2HNo1zM99F9EYEI Message-ID: Subject: Re: Custom Filter for Splitting CamelCase? To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org How do you use the WordDelimiterFilterFactory()? I tried the following code= : TokenStream out =3D new LowerCaseTokenizer(reader); WordDelimiterFilterFactory wdf =3D new WordDelimiterFilterFactory(); out =3D wdf.create(out); ... But I am getting a runtime error: Exception in thread "main" java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z at org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:14= 1) at org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFi= lter.java:54) ... I can't create a class of type WordDelimiterFilter directly, because it is protected. Any ideas? Thanks, Steve On Tue, Nov 29, 2011 at 12:44 PM, Uwe Schindler wrote: > Hi, > > There is WordDelimiterFilter in Solr that was also ported to Lucene Analy= sis > module in Lucene trunk (4.0). In 3.x yu can still add solr.jar to your > classpath and WordDelimiterFilterFactory to produce one (WordDelimiterFil= ter > itself is package-private). > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > >> -----Original Message----- >> From: stephen.warner.thomas@gmail.com >> [mailto:stephen.warner.thomas@gmail.com] On Behalf Of Stephen Thomas >> Sent: Tuesday, November 29, 2011 5:20 PM >> To: java-user@lucene.apache.org >> Subject: Custom Filter for Splitting CamelCase? >> >> List, >> >> I have written my own CustomAnalyzer, as follows: >> >> public TokenStream tokenStream(String fieldName, Reader reader) { >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 // TODO: add calls to RemovePuncation, and S= plitIdentifiers >> here >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 // First, convert to lower case >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 TokenStream out =3D new =A0LowerCaseTokenize= r(reader); >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (this.doStopping){ >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 out =3D new StopFilter(true,= out, customStopSet); >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (this.doStemming){ >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 out =3D new PorterStemFilter= (out); >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return out; >> =A0 =A0 =A0 =A0 } >> >> >> >> What I need to do is write two custom filters that do the following: >> >> - RemovePuncation() removes all characters except [a-zA-Z], preserving > case. >> E.g., >> >> "foo=3Dbar*45;" =3D=3D> "foo bar 45" >> "fooBar" =3D=3D> "fooBar" >> "\"sthomas@cs.queensu.ca\"" =3D=3D> "sthomas cs queensu ca" >> >> >> - SplitIdentifers() breaks up words based on camelCase notation: >> >> "fooBar" =3D=3D> "foo Bar" >> "ABCCompany" =3D=3D> "ABC Company" >> >> (I have the regex for this.) >> >> Note this step must be performed before LowerCaseTokenizer, because we >> need case information to do the splitting. >> >> >> How can I write custom filters, and how do I call them before >> LowerCaseTokenizer()? >> >> >> Thanks in advance, >> Steve >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org