lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clas Rydergren" <>
Subject Re: Modify the StandardAnalyzer
Date Sat, 06 Sep 2003 11:15:09 GMT

Hi again,

incze I am not sure I follow you. Do you mean that I should index with the 
SimpleAnalyzer and searching with the StandardAnalyzer? That is not an 
option for me since SimpleAnalyzer do not index number/digits the way I 
would like to (however, the StandardAnalyzer does!)

I found that modifying the StandardTokenizer.jj would be possible in order 
to modify the StandardAnalyzer to my prefered behaviour. When finished 
modifying the .jj-file, I run it trough JavaCC to get the .java-files. 
However those .java-files are not possible to compile together with the 
current distribution because of problems with the exception handling (see )

Next try was to compile Lucene (nightly snapshot) from scratch using Ant. 
That failed with errors with the "class collosions":

    [javac] Compiling 128 source files to 

duplicate class: org.apache.lucene.analysis.standard.StandardTokenizer
    [javac] public class StandardTokenizer extends 
org.apache.lucene.analysis.Tokenizer implements StandardTokenizerConstants {
    [javac]        ^

duplicate class: 
    [javac] public class StandardTokenizerTokenManager implements 
    [javac]        ^

So now my question is, where do I find a Lucene-verion which is possible to 
compile? Where do I find the source CVS för Lucene 1.2 r3, or something 


> > Hi,
> >
> > I have been experimenting with Lucene for a few hours, and now I'm 
> > for a solution to this:
> >
> > When using the SimpleAnalyzer for indexing text, data like 
> > seem to be indexed as www, hotmail and com which mean that a search for
> > "hotmail" will return a record. This is the behavior I am looking for!
> > However, since SimpleAnalyzer do not index numbers by default, I would 
> > to use the StandardAnalyzer. But, Standardanalyzer do not split the 
> > stream at ".".
> >
> > Ideally I should propably make my own analyser, but that seems to be a 
> > complicated to me :(. Which is the simplest possible modification that I
> > need to make to the Lucene source to make the StandardAnalyzer split, 
> > example web-addresses, at "." into separately indexed words?
> >
> > Can this be made by modifications to the StandardTokenizer.jj? How? What 
> > the easiest way of getting such modification into the "compiled" Lucene? 
> > there a need for recompiling everything?
> >
> > Appreciate all help!
> >
> > regards
> > clas
>You can stack up the two analyzers, first run the simple then the standard
>on the poutput.
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Tired of spam? Get advanced junk mail protection with MSN 8.

View raw message