lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <raghavendra.k....@barclays.com>
Subject WhitespaceAnalyzer vs StandardAnalyzer
Date Fri, 15 Nov 2013 20:21:45 GMT
Hi,

I implemented my Lucene solution using StandardAnalyzer for both indexing and searching. While
testing, I noticed that special characters such as hyphens, forward slash etc. are omitted
by this Analyzer.

In plain English, the requirement is to search for individual words, in Lucene terms SPACE
should be the only tokenizer. Also, no part of the text should not be modified / omitted.

For eg. ModelNumber: ABC/x:123
Here there should be only 2 tokens, "ModelNumber:" and "ABC/x:123".

Based on what I read about WhitespaceAnalyzer, it sounds as though it can do exactly what
I am looking for. Before I make this big decision, I also wanted to run this by you folks
to check if there are any side-effects of switching the Analyzer - keeping in mind my requirements.

Any suggestions as always would be greatly appreciated.

Regards,
Raghu


_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or
solicitation to buy or sell a product or service nor an official confirmation of any transaction.
It is directed at persons who are professionals and is not intended for retail customer use.
Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding
market commentary from Barclays Sales and/or Trading, who are active market participants;
and in respect of Barclays Research, including disclosures relating to specific issuers, please
see http://publicresearch.barclays.com.

_______________________________________________

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message