lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clas Rydergren" <cl...@hotmail.com>
Subject Re: Modify the StandardAnalyzer
Date Sat, 06 Sep 2003 13:12:40 GMT


Sorry, I found the Lucene-1.3-rc1-src files, which compiles perfectly well 
;-) and I "resolved" the original problem by including "." as a whitespace 
character in the queryParser.jj-file. This is maybe not the best solution, 
but it solved my problems for now.

Thanks
/Clas



>Hi again,
>
>incze I am not sure I follow you. Do you mean that I should index with the 
>SimpleAnalyzer and searching with the StandardAnalyzer? That is not an 
>option for me since SimpleAnalyzer do not index number/digits the way I 
>would like to (however, the StandardAnalyzer does!)
>
>
>I found that modifying the StandardTokenizer.jj would be possible in order 
>to modify the StandardAnalyzer to my prefered behaviour. When finished 
>modifying the .jj-file, I run it trough JavaCC to get the .java-files. 
>However those .java-files are not possible to compile together with the 
>current distribution because of problems with the exception handling (see  
>http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg01403.html )
>
>Next try was to compile Lucene (nightly snapshot) from scratch using Ant. 
>That failed with errors with the "class collosions":
>
>    [javac] Compiling 128 source files to 
>/home/clryd/lucene/jakarta-lucene/bin/classes
>    [javac] 
>/home/clryd/lucene/jakarta-lucene/bin/src/org/apache/lucene/analysis/standard/StandardTokenizer.java:14:

>duplicate class: org.apache.lucene.analysis.standard.StandardTokenizer
>    [javac] public class StandardTokenizer extends 
>org.apache.lucene.analysis.Tokenizer implements StandardTokenizerConstants 
>{
>    [javac]        ^
>    [javac] 
>/home/clryd/lucene/jakarta-lucene/bin/src/org/apache/lucene/analysis/standard/StandardTokenizerTokenManager.java:5:

>duplicate class: 
>org.apache.lucene.analysis.standard.StandardTokenizerTokenManager
>    [javac] public class StandardTokenizerTokenManager implements 
>StandardTokenizerConstants
>    [javac]        ^
>
>So now my question is, where do I find a Lucene-verion which is possible to 
>compile? Where do I find the source CVS för Lucene 1.2 r3, or something 
>similar?
>
>cheers
>/Clas
>
>
>> > Hi,
>> >
>> > I have been experimenting with Lucene for a few hours, and now I'm 
>>looking
>> > for a solution to this:
>> >
>> > When using the SimpleAnalyzer for indexing text, data like 
>>www.hotmail.com
>> > seem to be indexed as www, hotmail and com which mean that a search for
>> > "hotmail" will return a record. This is the behavior I am looking for!
>> > However, since SimpleAnalyzer do not index numbers by default, I would 
>>like
>> > to use the StandardAnalyzer. But, Standardanalyzer do not split the 
>>input
>> > stream at ".".
>> >
>> > Ideally I should propably make my own analyser, but that seems to be a 
>>bit
>> > complicated to me :(. Which is the simplest possible modification that 
>>I
>> > need to make to the Lucene source to make the StandardAnalyzer split, 
>>for
>> > example web-addresses, at "." into separately indexed words?
>> >
>> > Can this be made by modifications to the StandardTokenizer.jj? How? 
>>What is
>> > the easiest way of getting such modification into the "compiled" 
>>Lucene? Is
>> > there a need for recompiling everything?
>> >
>> > Appreciate all help!
>> >
>> > regards
>> > clas
>>
>>You can stack up the two analyzers, first run the simple then the standard
>>on the poutput.
>>
>>incze
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>_________________________________________________________________
>Tired of spam? Get advanced junk mail protection with MSN 8. 
>http://join.msn.com/?page=features/junkmail
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail


Mime
View raw message