lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Plater" <jpla...@healthmarketscience.com>
Subject RE: Lower/Uppercase problem when searching in a not-analyzed field
Date Mon, 14 Dec 2009 23:35:47 GMT
The issue is that you are using an analyzer on the search query and not at index time.  The
StandardAnalyzer that you are using at search time is lowercasing the query before searching
against the index.  You have a few options that I can think of:

1 - use a different analyzer at search time (one that doesn't effect case - if there is one
or create one yourself)
2 - analyze the field at index time (optionally storing the original field in a non-analyzed
state - if you want the original Domain)

The KeywordAnalyzer probably isn't what you want because if you use it at search time you
won't be able to use wildcard searching (unless you don't care about wildcard searching).


-Jeff


-----Original Message-----
From: Michel Nadeau [mailto:akaris@gmail.com]
Sent: Mon 12/14/2009 4:36 PM
To: java-user@lucene.apache.org
Subject: Lower/Uppercase problem when searching in a not-analyzed field
 
Hi !

My Lucene 3.0.0 index contains a field "DOMAIN" that contains an Internet
domain name - like

* www.DomainName.com
* www.domainname.com
* www.DomainName.com/path/to/document/doc.html?a=2

This field is indexed like this -

doc.add(new Field("DOMAIN", sValue, Field.Store.YES,
Field.Index.NOT_ANALYZED));

When I search in this field, my search query looks like this:

DOMAIN:www.DomainName*

My problem is that it seems it never returns domains with uppercase letters.

For example, I display all documents (using ConstantScoreQuery), and see
this domain name: www.BidClerk.com
...So I know it's there - and so I search for: DOMAIN:www.BidC* - well it
will *never* be found !

But whatever all-lowecase domain will be found, all the time.

My guess is that the problem is the analyzer I'm using - a StandadAnalyzer:

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content", new
StandardAnalyzer(Version.LUCENE_CURRENT));
q = parser.parse(QUERY);

So here are my questions:
* Should I use a KeywordAnalyzer instead?
* If I have domains like WWW.ASK.COM, www.ask.com, www.Ask.com,
WwW.AsK.CoM- and I search for "DOMAIN:
www.ask.com" ; will they all be found whatever the case?

Thanks!

- Mike
akaris@gmail.com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message