lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: WhitespaceAnalyzer vs StandardAnalyzer
Date Fri, 15 Nov 2013 21:44:53 GMT
Well, your example will work exactly as you want. And if your input is
strictly controlled, that's fine. But if you're putting in text, for
instance, punctuation  will be part of the token. I.e. in the sentence just
before this one, "token" would not be found, but "token." would.

The admin/analysis page is your friend :).

You might want to consider following with a LowerCaseFilterFactory here
unless you want your searches to be case sensitive.

And do watch querying in this case. You need to escape things like the
colon and other special characters, see: Special


On Fri, Nov 15, 2013 at 3:21 PM, <> wrote:

> Hi,
> I implemented my Lucene solution using StandardAnalyzer for both indexing
> and searching. While testing, I noticed that special characters such as
> hyphens, forward slash etc. are omitted by this Analyzer.
> In plain English, the requirement is to search for individual words, in
> Lucene terms SPACE should be the only tokenizer. Also, no part of the text
> should not be modified / omitted.
> For eg. ModelNumber: ABC/x:123
> Here there should be only 2 tokens, "ModelNumber:" and "ABC/x:123".
> Based on what I read about WhitespaceAnalyzer, it sounds as though it can
> do exactly what I am looking for. Before I make this big decision, I also
> wanted to run this by you folks to check if there are any side-effects of
> switching the Analyzer - keeping in mind my requirements.
> Any suggestions as always would be greatly appreciated.
> Regards,
> Raghu
> _______________________________________________
> This message is for information purposes only, it is not a recommendation,
> advice, offer or solicitation to buy or sell a product or service nor an
> official confirmation of any transaction. It is directed at persons who are
> professionals and is not intended for retail customer use. Intended for
> recipient only. This message is subject to the terms at:
> For important disclosures, please see:
> regarding market commentary
> from Barclays Sales and/or Trading, who are active market participants; and
> in respect of Barclays Research, including disclosures relating to specific
> issuers, please see
> _______________________________________________

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message