lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: wildcard uppercase
Date Fri, 13 Aug 2004 01:02:46 GMT
Query.toString() is your friend!  As well as troubleshooting without 
QueryParser in the picture too.

But, Daniel to the rescue :)

	Erik


On Aug 12, 2004, at 5:06 PM, Otis Gospodnetic wrote:

> My guess would be 'something in the QueryParser', but I don't know for
> sure.  Erik will know.... he's the fortunate guy who spent a lot of
> intimate moments with QueryParser. :)
> If I were you, I'd throw out QueryParser out of the equation by using
> the Lucene API (various Query classes) directly, instead of relying on
> QueryParser.  Then you will see if your query is still getting
> lower-cased or not.  If it's not, it's the QueryParser that's doing it.
>
> Otis
>
> --- "Kipping, Peter" <pkipping@crcpress.com> wrote:
>
>> Correct me if I'm wrong but the WhiteSpace Analyzer doesn't
>> lowercase.
>> As I mentioned in my previous email, that's the one I'm using.  When
>> I
>> don't use the wildcard everything works fine:
>>
>>         IndexSearcher is = new
>> IndexSearcher("C:/J2EE_Projects/Lucene/indexDirCompound");
>>         PerFieldAnalyzerWrapper pw = new PerFieldAnalyzerWrapper(new
>> StandardAnalyzer());
>>         pw.addAnalyzer("molecular_formula", new
>> WhitespaceAnalyzer());
>>         String sr = "C9H10O5";
>>         Query q = QueryParser.parse(sr, "molecular_formula", pw);
>>         System.out.println(q.toString());
>>
>> The last line generates the following:
>> molecular_formula:C9H10O5
>>
>> Now if I change the sr String:
>>
>>         IndexSearcher is = new
>> IndexSearcher("C:/J2EE_Projects/Lucene/indexDirCompound");
>>         PerFieldAnalyzerWrapper pw = new PerFieldAnalyzerWrapper(new
>> StandardAnalyzer());
>>         pw.addAnalyzer("molecular_formula", new
>> WhitespaceAnalyzer());
>>         String sr = "C9H10O5*";//only change is adding the *
>>         Query q = QueryParser.parse(sr, "molecular_formula", pw);
>>         System.out.println(q.toString());
>>
>> I get this:
>> molecular_formula:c9h10o5*
>>
>> As you can see it's been lower cased and I get no hits.  Looks like
>> something is lowercasing the wildcard query.  How can I make it not
>> do
>> that?
>>
>> Thanks,
>> Peter
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>> Sent: Thursday, August 12, 2004 3:03 PM
>> To: Lucene Users List
>> Subject: Re: wildcard uppercase
>>
>> Just use an Analyzer that doesn't lowercase.  That FAQ entry assumes
>> that the Analyzer does lowercase its input.
>> Searching IS case sensitive, it's just that people often use an
>> Analyzer that lowercases everything (at indexing and at query time),
>> so
>> the search appears not to be case sensitive, and that is just what
>> most
>> people want.
>>
>> Otis
>>
>> --- "Kipping, Peter" <pkipping@crcpress.com> wrote:
>>
>>> I'm doing wildcard searches on molecular formulas where case is
>>> critical.  For instance Co = Cobalt, CO = Carbon Monoxide.  I've
>> read
>>> the faq on this:
>>>
>>>
>>> Yes, unlike other types of Lucene queries, Wildcard, Prefix, and
>>> Fuzzy
>>> queries are case sensitive.
>>>
>>> That is because those types of queries are not passed through the
>>> Analyzer, which is the component that performs operations such as
>>> stemming and lowercasing.
>>>
>>> The reason for skipping the Analyzer is that if you were searching
>>> for
>>> "dogs*" you would not want "dogs" first stemmed to "dog", since
>> that
>>> would then match "dog*", which is not the intended query.
>>> A workaround for this is simply to lowercase the entire query
>> before
>>> passing it to the query parser.
>>>
>>>
>>> But it makes no sense.  First most analyzers don't even do
>> stemming.
>>> I'm using the whitespace analyzer which doesn't.  Second
>> lowercasing
>>> is
>>> a completely separate issue from stemming, I see no reason why the
>> a
>>> wildcard query has to be lowercased.  Is there any way to prevent
>> my
>>> wildcard queries from being lowercased?  Example:  String input
>>> "C9H10O5*", resulting query "c9h10o5*"
>>>
>>> Thanks,
>>>
>>> Peter
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail:
>> lucene-user-help@jakarta.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message