lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From crspan <crs...@gmail.com>
Subject Re: index U.K. U.S. U.N. U.V.
Date Tue, 17 Jul 2007 03:16:37 GMT
Are we sure about KeywordAnalyzer here? Which suppose to  "Tokenizes" 
the entire stream as a single token. (useful for data like zip codes, 
ids, and some product names.)

In the scenario we are discussing,  U.S. is  just a  token within the 
text and we still would like to leverage from StandardAnalyzer for all 
other goodies. I am sorry for the incomplete set up in previous message.

More or less, I expect somewhere we can instruct StandardTokenizer.jj 
that U.S. is a special token (even it is indeed an ACRONYM) and we 
prefer to index it as U.S. as is. Can we do that?

Charlie



Otis Gospodnetic wrote:
> Use KeywordAnalyzer to leave "U.S." as-is and index it as-is.
>
> Otis
> --
> Lucene Consulting -- http://lucene-consulting.com/
>
>
> ----- Original Message ----
> From: crspan <crspan@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Saturday, July 14, 2007 5:18:59 PM
> Subject: index U.K. U.S. U.N. U.V.
>
> Would you please advice the best practice of indexing:
>
>   U.S.
>
> The standard analyzer will transform it to be "us", which collide with 
> "us"(we).
>
> Thanks,
>
> Charlie


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message