lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Installing a custom tokenizer
Date Tue, 29 Aug 2006 20:57:44 GMT
Tucked away in the contrib section of  Lucene (I'm using 2.0) there is....

org.apache.lucene.index.memory.PatternAnalyzer

which takes a regular expression as and tokenizes with it. Would that help?
Word of warning... the regex determines what is NOT a token, not what IS a
token (as I remember), which threw me for a bit.

Don't know if this is really useful, but it might work for you without as
much work...

Best
Erick@I'mNowBeyondMyCompetence.WhyDoTheyStillEmployMeHere?

On 8/29/06, Bill Taylor <wataylor@as-st.com> wrote:
>
>
> On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote:
>
> >
> > : Have a look at PerFieldAnalyzerWrapper:
> >
> > :
> > http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/
> > PerFieldAnalyzerWrapper.html
> >
> > ...which can be specified in the constructors for IndexWriter and
> > QueryParser.
>
> As I understand it, this allows me to specify a different analyzer for
> each field name.  My problem is that the standard analyzer will not
> work for my content field and I need to define a new one.  I need to
> make a modification to the StandardTokenizer so that a number does not
> need to have a digit in every other segment of a part number.
>
> For example, the StandardTokenizer breaks aa-bb-2 on the - between aa
> and bb because it demands that every other string between a - have a
> digit.
>
> I need to modify the .jj file for the Standard Tokenizer and get a new
> one, but I am confused by the javaCC documentation and do not know how
> to run it to get what I need.
>
> Thanks for the help.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message