Return-Path: Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 32938 invoked from network); 6 Sep 2003 11:16:37 -0000 Received: from unknown (HELO hotmail.com) (65.54.245.157) by daedalus.apache.org with SMTP; 6 Sep 2003 11:16:37 -0000 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Sat, 6 Sep 2003 04:15:09 -0700 Received: from 213.114.25.174 by by1fd.bay1.hotmail.msn.com with HTTP; Sat, 06 Sep 2003 11:15:09 GMT X-Originating-IP: [213.114.25.174] X-Originating-Email: [clr72@hotmail.com] From: "Clas Rydergren" To: lucene-user@jakarta.apache.org Bcc: Subject: Re: Modify the StandardAnalyzer Date: Sat, 06 Sep 2003 11:15:09 +0000 Mime-Version: 1.0 Content-Type: text/plain; format=flowed Message-ID: X-OriginalArrivalTime: 06 Sep 2003 11:15:09.0434 (UTC) FILETIME=[221E55A0:01C37468] X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi again, incze I am not sure I follow you. Do you mean that I should index with the SimpleAnalyzer and searching with the StandardAnalyzer? That is not an option for me since SimpleAnalyzer do not index number/digits the way I would like to (however, the StandardAnalyzer does!) I found that modifying the StandardTokenizer.jj would be possible in order to modify the StandardAnalyzer to my prefered behaviour. When finished modifying the .jj-file, I run it trough JavaCC to get the .java-files. However those .java-files are not possible to compile together with the current distribution because of problems with the exception handling (see http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg01403.html ) Next try was to compile Lucene (nightly snapshot) from scratch using Ant. That failed with errors with the "class collosions": [javac] Compiling 128 source files to /home/clryd/lucene/jakarta-lucene/bin/classes [javac] /home/clryd/lucene/jakarta-lucene/bin/src/org/apache/lucene/analysis/standard/StandardTokenizer.java:14: duplicate class: org.apache.lucene.analysis.standard.StandardTokenizer [javac] public class StandardTokenizer extends org.apache.lucene.analysis.Tokenizer implements StandardTokenizerConstants { [javac] ^ [javac] /home/clryd/lucene/jakarta-lucene/bin/src/org/apache/lucene/analysis/standard/StandardTokenizerTokenManager.java:5: duplicate class: org.apache.lucene.analysis.standard.StandardTokenizerTokenManager [javac] public class StandardTokenizerTokenManager implements StandardTokenizerConstants [javac] ^ So now my question is, where do I find a Lucene-verion which is possible to compile? Where do I find the source CVS f�r Lucene 1.2 r3, or something similar? cheers /Clas > > Hi, > > > > I have been experimenting with Lucene for a few hours, and now I'm >looking > > for a solution to this: > > > > When using the SimpleAnalyzer for indexing text, data like >www.hotmail.com > > seem to be indexed as www, hotmail and com which mean that a search for > > "hotmail" will return a record. This is the behavior I am looking for! > > However, since SimpleAnalyzer do not index numbers by default, I would >like > > to use the StandardAnalyzer. But, Standardanalyzer do not split the >input > > stream at ".". > > > > Ideally I should propably make my own analyser, but that seems to be a >bit > > complicated to me :(. Which is the simplest possible modification that I > > need to make to the Lucene source to make the StandardAnalyzer split, >for > > example web-addresses, at "." into separately indexed words? > > > > Can this be made by modifications to the StandardTokenizer.jj? How? What >is > > the easiest way of getting such modification into the "compiled" Lucene? >Is > > there a need for recompiling everything? > > > > Appreciate all help! > > > > regards > > clas > >You can stack up the two analyzers, first run the simple then the standard >on the poutput. > >incze > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > _________________________________________________________________ Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail