lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Savvas-Andreas Moysidis <savvas.andreas.moysi...@googlemail.com>
Subject Re: Lower/Uppercase problem when searching in a not-analyzed field
Date Mon, 14 Dec 2009 23:34:32 GMT
Hi,

my guess would also be that the StandardAnalyzer lowercases your terms while
you have indexed them as they are without lowercasing.
One idea would be to use the PerFieldAnalyzerWrapper and map a
KeywordAnalyzer (which basically doesn't tokenise your stream at all) to any
fields you want not analyzed.

Remember that when indexing you don't need to specify
Field.Index.NOT_ANALYZED to those fields anymore and that you need to
specify the same analyzer to the QueryParser when searching.


savvas.

2009/12/14 Michel Nadeau <akaris@gmail.com>

> Hi !
>
> My Lucene 3.0.0 index contains a field "DOMAIN" that contains an Internet
> domain name - like
>
> * www.DomainName.com
> * www.domainname.com
> * www.DomainName.com/path/to/document/doc.html?a=2
>
> This field is indexed like this -
>
> doc.add(new Field("DOMAIN", sValue, Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>
> When I search in this field, my search query looks like this:
>
> DOMAIN:www.DomainName*
>
> My problem is that it seems it never returns domains with uppercase
> letters.
>
> For example, I display all documents (using ConstantScoreQuery), and see
> this domain name: www.BidClerk.com
> ...So I know it's there - and so I search for: DOMAIN:www.BidC* - well it
> will *never* be found !
>
> But whatever all-lowecase domain will be found, all the time.
>
> My guess is that the problem is the analyzer I'm using - a StandadAnalyzer:
>
> QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content", new
> StandardAnalyzer(Version.LUCENE_CURRENT));
> q = parser.parse(QUERY);
>
> So here are my questions:
> * Should I use a KeywordAnalyzer instead?
> * If I have domains like WWW.ASK.COM, www.ask.com, www.Ask.com,
> WwW.AsK.CoM- and I search for "DOMAIN:
> www.ask.com" ; will they all be found whatever the case?
>
> Thanks!
>
> - Mike
> akaris@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message