lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Byrne <john.by...@propylon.com>
Subject Re: case insensitivity
Date Wed, 25 Jun 2008 13:59:32 GMT
What I had in mind was actually very simple: when you create a Term 
(programatically) you normally set the text and the field. I would also 
like to be able to set the case sensitivity to true or false for that 
specific Term object.

I imangined (and maybe I am over simplifying it!) that somewhere in the 
API there must be a string comparison using 'String.equals()' that 
determines if a document contains the term or not - and that use of 
'equals()' has permanently locked Lucene into case-sensitive searching. 
The values being compared could be first lower-cased (or 
equalsIgnoreCase could be used) depending on the value of a boolean flag 
in the Term object.

If that option was there, there would be no need to ever change the case 
in the analyzer - you'd be able to control case-sensitivity regardless 
of the field used.

Of course, I realize that there is currently no way to take advantage of 
such a feature in the QueryParser. It could only be done 
programatically. But I don't think that's a reason not to do it, since 
the API already has features that aren't implemented in the QueryParser 
(like SpanQuerys). In a perfect world, the parser would support all the 
features, but for the time being anyone who wants to take advantage of 
the newer features has to find an alternative anyway.

The problem that it would solve for me is, as I mentioned, that I could 
mix case-sensitive Terms with case-insensitive Terms when using 
SpanQuerys. I currently have no way to do that.

Regards,
-John

Erick Erickson wrote:
> Well, it depends on what you mean by "per term". There's already
> PerFieldAnalyzerWrapper for each field, but I don't think that's what
> you want.
>
> How do you expect a per term analyzer to behave? I'm having a hard
> time thinking of a use case that's general. You could always
> roll your own analyzer that didn't change case for your particular
> list of words.
>
> But the problem is your users. In your example, suppose a user
> typed in "dell computers". Would that match "Dell computers"?
> Does your analyzer automatically upper-case some words? If it
> does, that's the same as lower casing them all. If it doesn't,
> how do you explain that to your users?
>
> All in all, I'm having a tough time imagining how this would work.
> It's easy enough to say "let's assume", but I suspect that
> whatever solution satisfied your example will have its own problems
> that are far worse than just lower-casing things.
>
> Best
> Erick
>
>
> On Wed, Jun 25, 2008 at 5:37 AM, John Byrne <john.byrne@propylon.com> wrote:
>
>   
>> Hi,
>>
>> I know that case-insensitive searching is normally done by creating an
>> all-lower-case version of the documents, and turning the search terms into
>> lower case whenever this field is searched, but this approach has it's
>> disadvantages.
>>
>> Let's say, for example, you want to find "Dell" (with a capital "D"), near
>> "computers" (with or without capitals, ie. in any case). The problem is that
>> you would need to use a SpanQuery to find terms near each other; but if the
>> case-sensitivity required is different for each term, then they will be in
>> different fields, making the use of SpanQuerys inpossible.
>>
>> There might be ways to work around this, but my question is: will
>> case-insensitvity ever be added to Lucene as per-Term option? If not, can
>> anyone tell me where I should start looking in order to make this change
>> myself?
>>
>> Thanks!
>>
>> -JB
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   
> ------------------------------------------------------------------------
>
> No virus found in this incoming message.
> Checked by AVG. 
> Version: 7.5.524 / Virus Database: 270.4.1/1517 - Release Date: 24/06/2008 20:41
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message