lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Nadeau <aka...@gmail.com>
Subject Lower/Uppercase problem when searching in a not-analyzed field
Date Mon, 14 Dec 2009 21:36:59 GMT
Hi !

My Lucene 3.0.0 index contains a field "DOMAIN" that contains an Internet
domain name - like

* www.DomainName.com
* www.domainname.com
* www.DomainName.com/path/to/document/doc.html?a=2

This field is indexed like this -

doc.add(new Field("DOMAIN", sValue, Field.Store.YES,
Field.Index.NOT_ANALYZED));

When I search in this field, my search query looks like this:

DOMAIN:www.DomainName*

My problem is that it seems it never returns domains with uppercase letters.

For example, I display all documents (using ConstantScoreQuery), and see
this domain name: www.BidClerk.com
...So I know it's there - and so I search for: DOMAIN:www.BidC* - well it
will *never* be found !

But whatever all-lowecase domain will be found, all the time.

My guess is that the problem is the analyzer I'm using - a StandadAnalyzer:

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content", new
StandardAnalyzer(Version.LUCENE_CURRENT));
q = parser.parse(QUERY);

So here are my questions:
* Should I use a KeywordAnalyzer instead?
* If I have domains like WWW.ASK.COM, www.ask.com, www.Ask.com,
WwW.AsK.CoM- and I search for "DOMAIN:
www.ask.com" ; will they all be found whatever the case?

Thanks!

- Mike
akaris@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message