lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ngo, Anh \(ISS Southfield\)" <a...@iss.net>
Subject RE: StandardAnalyzer question
Date Fri, 21 Jul 2006 19:44:26 GMT

What is #LETTER definition in SnardarTokernize.jj?


I saw:

| <#P: ("_"|"-"|"/"|"."|",") >
| <#HAS_DIGIT:					  // at least one digit
    (<LETTER>|<DIGIT>)*
    <DIGIT>
    (<LETTER>|<DIGIT>)*
  >


Should I remove "_" and recompile the source code?

Sincerely,


Anh Ngo

-----Original Message-----
From: Daniel Naber [mailto:lucenelist2005@danielnaber.de] 
Sent: Friday, July 21, 2006 2:49 PM
To: java-user@lucene.apache.org
Subject: Re: StandardAnalyzer question

On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote:

> The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a
> token.  Is there a way I can make StandardAnalyzer don't tokenize for
> "_" or any given characters?

You need to add "_" to the #LETTER definition in StandardTokenizer.jj, then 
rebuild StandardTokenizer.java using the appropriate and task.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message