Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of stephen.warner.thomas@gmail.com
 designates 209.85.161.48 as permitted sender)
MIME-Version: 1.0
Sender: stephen.warner.thomas@gmail.com
In-Reply-To: <005801ccaebe$7c140d90$743c28b0$@thetaphi.de>
References: 
 <CA+U9gD32q7mqqB4Oy6O89sWRniyWvRJ5H3WtDD1_ELw-3J_kMQ@mail.gmail.com>
 <005801ccaebe$7c140d90$743c28b0$@thetaphi.de>
From: Stephen Thomas <sthomas@cs.queensu.ca>
Date: Tue, 29 Nov 2011 13:38:52 -0500
Message-ID: 
 <CA+U9gD1j4t7zXoudhn-9Ju-8=i13jWmH_3gXLJ5_ok2PPrr4iw@mail.gmail.com>
Subject: Re: Custom Filter for Splitting CamelCase?
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

How do you use the WordDelimiterFilterFactory()? I tried the following code=
:


TokenStream out =3D new  LowerCaseTokenizer(reader);
WordDelimiterFilterFactory wdf =3D new WordDelimiterFilterFactory();
out =3D wdf.create(out);
...

But I am getting a runtime error:

Exception in thread "main" java.lang.AbstractMethodError:
org.apache.lucene.analysis.TokenStream.incrementToken()Z
	at org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:14=
1)
	at org.apache.lucene.analysis.PorterStemFilter.incrementToken(PorterStemFi=
lter.java:54)
        ...

I can't create a class of type WordDelimiterFilter directly, because
it is protected.

Any ideas?

Thanks,
Steve


On Tue, Nov 29, 2011 at 12:44 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Hi,
>
> There is WordDelimiterFilter in Solr that was also ported to Lucene Analy=
sis
> module in Lucene trunk (4.0). In 3.x yu can still add solr.jar to your
> classpath and WordDelimiterFilterFactory to produce one (WordDelimiterFil=
ter
> itself is package-private).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: stephen.warner.thomas@gmail.com
>> [mailto:stephen.warner.thomas@gmail.com] On Behalf Of Stephen Thomas
>> Sent: Tuesday, November 29, 2011 5:20 PM
>> To: java-user@lucene.apache.org
>> Subject: Custom Filter for Splitting CamelCase?
>>
>> List,
>>
>> I have written my own CustomAnalyzer, as follows:
>>
>> public TokenStream tokenStream(String fieldName, Reader reader) {
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 // TODO: add calls to RemovePuncation, and S=
plitIdentifiers
>> here
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 // First, convert to lower case
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 TokenStream out =3D new =A0LowerCaseTokenize=
r(reader);
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (this.doStopping){
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 out =3D new StopFilter(true,=
 out, customStopSet);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (this.doStemming){
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 out =3D new PorterStemFilter=
(out);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return out;
>> =A0 =A0 =A0 =A0 }
>>
>>
>>
>> What I need to do is write two custom filters that do the following:
>>
>> - RemovePuncation() removes all characters except [a-zA-Z], preserving
> case.
>> E.g.,
>>
>> "foo=3Dbar*45;" =3D=3D> "foo bar 45"
>> "fooBar" =3D=3D> "fooBar"
>> "\"sthomas@cs.queensu.ca\"" =3D=3D> "sthomas cs queensu ca"
>>
>>
>> - SplitIdentifers() breaks up words based on camelCase notation:
>>
>> "fooBar" =3D=3D> "foo Bar"
>> "ABCCompany" =3D=3D> "ABC Company"
>>
>> (I have the regex for this.)
>>
>> Note this step must be performed before LowerCaseTokenizer, because we
>> need case information to do the splitting.
>>
>>
>> How can I write custom filters, and how do I call them before
>> LowerCaseTokenizer()?
>>
>>
>> Thanks in advance,
>> Steve
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org