lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Maisonneuve" <n.maisonne...@HotPOP.com>
Subject StandardTokenizer problem
Date Thu, 04 Sep 2003 14:07:14 GMT
hy ,
when i use standardTokenizer
for parse for example "I.B.M"
the type of the Token  is HOST and not ACRONYM

WHY ???

in StandardTokenizer.jj

 // acronyms: U.S.A., I.B.M., etc.
  // use a post-filter to remove dots
| <ACRONYM: <ALPHA> "." (<ALPHA> ".")+ >

  // hostname
| <HOST: <ALPHANUM> ("." <ALPHANUM>)+ >

"I.B.M" can be a host or acronym, so threre is a problem , no  ?

----- Original Message ----- 
From: "petite_abeille" <petite_abeille@mac.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, September 04, 2003 3:19 PM
Subject: Re: Lucene app to index Java code


> Hi Erik,
> 
> On Thursday, Sep 4, 2003, at 15:03 Europe/Zurich, Erik Hatcher wrote:
> 
> > - XDoclet could be used to sweep through Java code and build a 
> > text/XML file as richly as you'd like from the information there 
> > (complete with JavaDoc tags, which Zapata will miss :)),
> 
> Correct. This happen to be on purpose :) Does XDoclet build an 
> "intertwingled" object graph of your code along the way? Performing a 
> plain search on a code base is pretty trivial... what seems to be more 
> interesting would be to put that in context.
> 
> Zapata does something along the line of what MagicHat does for 
> Objective-C:
> 
> http://homepage.mac.com/petite_abeille/MagicHat/
> 
> But from the sound of what Otis is saying this is not what you guys are 
> looking for... back to the pampa then...
> 
> Cheers,
> 
> PA.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 



Mime
View raw message