lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: Case Insensitive Matching in Solr/Lucene
Date Tue, 25 Nov 2014 13:05:11 GMT
Hey Michael,
 Thanks for your reply. My use case is a little different. I would like to
get the original values in facet queries but I would like to apply filter
queries in a case insensitive fashion.

For example  I require facet_query to return Quick, The, brown, ...
But I want filter queries of the form fq=Term:"quick"

Also could you please point me to some additional links on how I can index
different variants of a token at the same position?


--
Regards,
Apurv Verma



On Tue, Nov 25, 2014 at 6:26 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> right -- missed Ahmet's answer there in my haste to respond ...
>
> -Mike
>
>
> On 11/25/14 6:56 AM, Ahmet Arslan wrote:
>
>> Hi Apurv,
>>
>> I wouldn't worry about index size, increase in index size is not linear
>> (2x) like that.
>> Please see similar discussion :
>> https://issues.apache.org/jira/browse/LUCENE-5620
>>
>> Ahmet
>>
>>
>> On Tuesday, November 25, 2014 1:46 PM, Ahmet Arslan
>> <iorixxx@yahoo.com.INVALID> wrote:
>>
>>
>>
>> Hi Apurv,
>>
>> You can create an additional field for case sensitive search, and then
>> you can switch at query time. You will have two fields (text_ci and
>> text_lower) with different analysers populated with copyField.
>>
>> Ahmet
>>
>>
>>
>> On Tuesday, November 25, 2014 1:39 PM, Apurv Verma <apurv@bloomreach.com>
>> wrote:
>> Hey all,
>> The standard solution to doing a case-insensitive match in lucene is to
>> use a Lowercase filter at index and query time. However this does not
>> preserve the content of the original document. For example if my inverted
>> index is.
>>
>> Term      Doc_1  Doc_2
>> -------------------------
>> Quick   |       |  X
>> The     |   X   |
>> brown   |   X   |  X
>> dog     |   X   |
>> dogs    |       |  X
>> fox     |   X   |
>> foxes   |       |  X
>> in      |       |  X
>> jumped  |   X   |
>> lazy    |   X   |  X
>> leap    |       |  X
>> over    |   X   |  X
>> quick   |   X   |
>> summer  |       |  X
>> the     |   X   |
>> ------------------------
>>
>> Is it possible to choose between case insensitive/ case sensitive match at
>> query time. The index is stored in memory in solr. My question is, if this
>> is stored as a hashmap with string key can I override the hashcode so that
>> "Quick" and "quick" return the same hash value?
>>
>> Has anyone attempted this before? Is my assumption about index right? What
>> would be the classes and code flow to look at?
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message