lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Landon Cox" <l...@interactive-media.com>
Subject RE: QueryParser question - case-sensitivity
Date Thu, 09 May 2002 20:06:36 GMT
Hi DaveP and Otis,

After looking further, my take:

It would be symmetrical except the ID field (term) I'm looking for was
indexed as a Keyword since it's a file path that I don't want tokenized.  I
think what's happening is that since it's not being tokenized, even though
I'm using StandardAnalyzer on both indexing and querying, when indexed it's
not going through the lower case filter of StandardAnalyzer and therefore is
stored fully respecting case-sensitivity.

On the flipside, the query doesn't really know the same thing (term names
mapped to field types - in this case a keyword) and is running all queries
through StandardAnalyzer without regard to term name and type (as it was
designed.)

So, I think you're right - it comes down to the analyzer, but more directly,
I think it comes down to the fact that the Keyword value is unmolested when
indexed but the query term value after going through QueryParser.parse() is
lower-case due to the LowerCaseFilter that StandardAnalyzer uses.

For a keyword field, the docs say:
Keyword
public static final Field Keyword(String name,
                                  String value)
Constructs a String-valued Field that is not tokenized, but is indexed and
stored. Useful for non-text fields, e.g. date or url.

If you look at StandardAnalyzer source, the tokenStream method runs it
through LowerCaseFilter as spec'd.  But since a Keyword is not tokenized,
it's stored/indexed respecting case.

Does that jive with your knowledge of the source and behavior of the
classes?

It does look like I need to make a query analyzer that's a little more
"aware" of my field names (and types) for querying purposes...that analyzer
would match the behavior on the indexing side such that it knows what fields
are Keywords and therefore whether to pass them through unchanged or not.

Thanks for the feedback.

Landon

PS. Late break: Just read the mail from Doug after writing this analysis.
Think it confirmed what was going on.  Thank you, Doug.


|-----Original Message-----
|From: Landon Cox [mailto:lcox@interactive-media.com]
|Sent: Thursday, May 09, 2002 12:29 PM
|To: Lucene Users List
|Subject: RE: QueryParser question - case-sensitivity
|
|
|
|Hi Otis,
|
|On both the indexing side and creation of the query parser, I'm using the
|StandardAnalyzer class.  Seems like it would be symmetrical w/r to case
|sensitivity, but it's apparently not related to the problem or it's a
|bug...I suspect the former.  I'll start looking at the source
|next.  Thanks,
|
|Landon
|
||-----Original Message-----
||From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
||Sent: Thursday, May 09, 2002 11:24 AM
||To: Lucene Users List
||Subject: Re: QueryParser question - case-sensitivity
||
||
||Wouldn't that be the Analzyer that you are using?
||I don't have the source handy to check it for you, but look for
||toLowerCase or some such, and you'll find who's messing with your
||queries.
||Replace that piece, and you'll keep your upper cases.
||
||Otis


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message