Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CD8CB10B98 for ; Tue, 25 Nov 2014 11:36:24 +0000 (UTC) Received: (qmail 20413 invoked by uid 500); 25 Nov 2014 11:36:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 20352 invoked by uid 500); 25 Nov 2014 11:36:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 20340 invoked by uid 99); 25 Nov 2014 11:36:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Nov 2014 11:36:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of apurv@bloomreach.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Nov 2014 11:36:18 +0000 Received: by mail-ie0-f182.google.com with SMTP id x19so308402ier.41 for ; Tue, 25 Nov 2014 03:35:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bloomreach.com; s=google; h=mime-version:from:date:message-id:subject:to:content-type; bh=tV6mQ4jOeCto3jB3sdw3Bpk4rxJ+pRPb9PqPgvgI9rw=; b=NYl79xtoUwKXRAWAGhgjsWaiqrOxtYUUusJVU1ZpNq6scKoi9KO10FsN6ZRLZ01j+z 6L+2TkoBDpYi83ctpAq4APQ7XkK5dXC1WommBtXt7me56Li5q7ZQpAA7iaFL99TEbFrk Pdp1Mx8YYSxRgKi1C9NI9XFl8A6yd3lPcc0Ig= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=tV6mQ4jOeCto3jB3sdw3Bpk4rxJ+pRPb9PqPgvgI9rw=; b=gAzVTA7n/dadEeNcBznPJB4fcL1V+VM79q/cXR7h2+rUd6D378JYTk++Hg9YDqfuar mFejDu2zSPfTzoFHr0zK/LJoOuQ/42etvV39Slc1OVdE1bH3W6UINg8Rka6Gj8QDJZR+ cobqyI+PYxQXjvIPva8uH9fzFRPYmEumwpJ+o91hbP/gkz4gzMhElxlXC9NaATwOuZaQ oNbCZyVWQjL0SnK2GN+e/oTwKnNgBHUc5mYojKI0pVxaqN7LZmWIJjUTCI+OTtWTHOW5 dr/00WrtYUImYzjRsJZEAQ6NL/0edbsYre0W+gEg+8r9dmSK4VBU10Dd3QFXxI3r1xDy UeDg== X-Gm-Message-State: ALoCoQlYNCFd5qMDsCZjnrwqULBaA9x1lsrQ4SyMA0mwobD8uswiHS/m+Fh4FRf05meIy0XnqoM4 X-Received: by 10.107.18.2 with SMTP id a2mr23944032ioj.41.1416915357470; Tue, 25 Nov 2014 03:35:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.34.203 with HTTP; Tue, 25 Nov 2014 03:35:37 -0800 (PST) From: Apurv Verma Date: Tue, 25 Nov 2014 17:05:37 +0530 Message-ID: Subject: Case Insensitive Matching in Solr/Lucene To: java-user@lucene.apache.org, solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a113ee8008f33140508ad5041 X-Virus-Checked: Checked by ClamAV on apache.org --001a113ee8008f33140508ad5041 Content-Type: text/plain; charset=UTF-8 Hey all, The standard solution to doing a case-insensitive match in lucene is to use a Lowercase filter at index and query time. However this does not preserve the content of the original document. For example if my inverted index is. Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------ Is it possible to choose between case insensitive/ case sensitive match at query time. The index is stored in memory in solr. My question is, if this is stored as a hashmap with string key can I override the hashcode so that "Quick" and "quick" return the same hash value? Has anyone attempted this before? Is my assumption about index right? What would be the classes and code flow to look at? -- Regards, Apurv --001a113ee8008f33140508ad5041--