lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: Case Insensitive Matching in Solr/Lucene
Date Tue, 25 Nov 2014 11:52:13 GMT
Hii Ahmet,
 Thanks for your reply. Creating two separate fields is a viable solution
where one contains the original value and the other contains the lowercased
value. But this leads to index bloat up. (~ 2x)
I am looking for any other alternative solutions.


--
Regards,
Apurv Verma



On Tue, Nov 25, 2014 at 5:15 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> Hi Apurv,
>
> You can create an additional field for case sensitive search, and then you
> can switch at query time. You will have two fields (text_ci and text_lower)
> with different analysers populated with copyField.
>
> Ahmet
>
>
> On Tuesday, November 25, 2014 1:39 PM, Apurv Verma <apurv@bloomreach.com>
> wrote:
> Hey all,
> The standard solution to doing a case-insensitive match in lucene is to
> use a Lowercase filter at index and query time. However this does not
> preserve the content of the original document. For example if my inverted
> index is.
>
> Term      Doc_1  Doc_2
> -------------------------
> Quick   |       |  X
> The     |   X   |
> brown   |   X   |  X
> dog     |   X   |
> dogs    |       |  X
> fox     |   X   |
> foxes   |       |  X
> in      |       |  X
> jumped  |   X   |
> lazy    |   X   |  X
> leap    |       |  X
> over    |   X   |  X
> quick   |   X   |
> summer  |       |  X
> the     |   X   |
> ------------------------
>
> Is it possible to choose between case insensitive/ case sensitive match at
> query time. The index is stored in memory in solr. My question is, if this
> is stored as a hashmap with string key can I override the hashcode so that
> "Quick" and "quick" return the same hash value?
>
> Has anyone attempted this before? Is my assumption about index right? What
> would be the classes and code flow to look at?
>
> --
> Regards,
> Apurv
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message