lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject RE: wildcard uppercase
Date Thu, 12 Aug 2004 21:06:39 GMT
My guess would be 'something in the QueryParser', but I don't know for
sure.  Erik will know.... he's the fortunate guy who spent a lot of
intimate moments with QueryParser. :)
If I were you, I'd throw out QueryParser out of the equation by using
the Lucene API (various Query classes) directly, instead of relying on
QueryParser.  Then you will see if your query is still getting
lower-cased or not.  If it's not, it's the QueryParser that's doing it.

Otis

--- "Kipping, Peter" <pkipping@crcpress.com> wrote:

> Correct me if I'm wrong but the WhiteSpace Analyzer doesn't
> lowercase.
> As I mentioned in my previous email, that's the one I'm using.  When
> I
> don't use the wildcard everything works fine:
> 
>         IndexSearcher is = new
> IndexSearcher("C:/J2EE_Projects/Lucene/indexDirCompound");
>         PerFieldAnalyzerWrapper pw = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer());
>         pw.addAnalyzer("molecular_formula", new
> WhitespaceAnalyzer());
>         String sr = "C9H10O5";
>         Query q = QueryParser.parse(sr, "molecular_formula", pw);
>         System.out.println(q.toString());
> 
> The last line generates the following:
> molecular_formula:C9H10O5
> 
> Now if I change the sr String:
> 
>         IndexSearcher is = new
> IndexSearcher("C:/J2EE_Projects/Lucene/indexDirCompound");
>         PerFieldAnalyzerWrapper pw = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer());
>         pw.addAnalyzer("molecular_formula", new
> WhitespaceAnalyzer());
>         String sr = "C9H10O5*";//only change is adding the *
>         Query q = QueryParser.parse(sr, "molecular_formula", pw);
>         System.out.println(q.toString());
> 
> I get this:
> molecular_formula:c9h10o5*
> 
> As you can see it's been lower cased and I get no hits.  Looks like
> something is lowercasing the wildcard query.  How can I make it not
> do
> that?
> 
> Thanks,
> Peter
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Thursday, August 12, 2004 3:03 PM
> To: Lucene Users List
> Subject: Re: wildcard uppercase
> 
> Just use an Analyzer that doesn't lowercase.  That FAQ entry assumes
> that the Analyzer does lowercase its input.
> Searching IS case sensitive, it's just that people often use an
> Analyzer that lowercases everything (at indexing and at query time),
> so
> the search appears not to be case sensitive, and that is just what
> most
> people want.
> 
> Otis
> 
> --- "Kipping, Peter" <pkipping@crcpress.com> wrote:
> 
> > I'm doing wildcard searches on molecular formulas where case is
> > critical.  For instance Co = Cobalt, CO = Carbon Monoxide.  I've
> read
> > the faq on this:
> > 
> > 
> > Yes, unlike other types of Lucene queries, Wildcard, Prefix, and
> > Fuzzy
> > queries are case sensitive. 
> > 
> > That is because those types of queries are not passed through the
> > Analyzer, which is the component that performs operations such as
> > stemming and lowercasing. 
> > 
> > The reason for skipping the Analyzer is that if you were searching
> > for
> > "dogs*" you would not want "dogs" first stemmed to "dog", since
> that
> > would then match "dog*", which is not the intended query. 
> > A workaround for this is simply to lowercase the entire query
> before
> > passing it to the query parser. 
> > 
> > 
> > But it makes no sense.  First most analyzers don't even do
> stemming.
> > I'm using the whitespace analyzer which doesn't.  Second
> lowercasing
> > is
> > a completely separate issue from stemming, I see no reason why the
> a
> > wildcard query has to be lowercased.  Is there any way to prevent
> my
> > wildcard queries from being lowercased?  Example:  String input
> > "C9H10O5*", resulting query "c9h10o5*"
> > 
> > Thanks,
> > 
> > Peter
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message