lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: Indexing/Analyzer question - case-insensitive "contains" search
Date Mon, 30 Jul 2007 14:12:31 GMT
Hello,

> Hi everyone,
> 
> I told you I'd be back with more questions!  :-)
> Here is my situation. In my application, the field to be searched is
> selected via a drop-down box. I want my searches to basically 
> be "contains"
> searches - I take what the user typed in, put a wildcard 
> character at the
> beginning and end, and put that in a WildcardQuery with the 
> selected field.
> In simple cases, this works great.

It does sound very strange to me, to default to a WildCardQuery! Suppose I am looking for
"bold", I am getting hits for "old". 

IMO, you should move the WildcardQuery and just use a simple QueryParser (with the analyzer
you use at indexing time). Your problem below arises from the fact that you construct your
search with WildcardQuery(Term t) and t = new Term("field","Joe's");

But, now you are looking for a term that is very likely not to be present in the index, although
you idnexed text that contains "Joe's". The StandardAnalyzer() for example would probably
split in ' , and ignores the s. If you use queryparser instead of creating your own term,
you are save (and you have no problems with case-sensitive either).

If you do not really understand why it works like this, it might be good to play around with
luke: open your index with luke, go to plugins tab, and put in some text and see how it is
tokenized with some sample analyzers. 

Regards Ard

> 
> But, the StandardAnalyzer and SimpleAnalyzer is removing some 
> characters I
> need. For example, one of my objects has a name of "Joe's 
> Devices". If I
> search for "Joe's", it doesn't work, because the apostrophe 
> is stripped out.
> I tried using the KeywordAnalyzer, which keeps the string 
> intact, but then
> won't my searches be case-sensitive? This is easy to fix of course by
> calling toLowerCase() on the text when it is indexed, but 
> then later when
> retrieved from the index to be displayed in the search results, "Joe's
> Devices" is displayed as "joe's devices". Is there anything I 
> can do here
> short of putting two copies of the name in the document - one 
> indexed/not
> stored ("joe's devices"), and one stored/not indexed("Joe's 
> Devices") ? Or
> can I accomplish this case-insensitive "contains" search some 
> other way -
> would I have to write a custom Analyzer, or something?
> 
> Thanks in advance!
> 
> -- 
> Joe Attardi
> jattardi@gmail.com
> http://thinksincode.blogspot.com/
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message