Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 65194 invoked from network); 16 Jan 2007 23:01:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Jan 2007 23:01:27 -0000 Received: (qmail 54021 invoked by uid 500); 16 Jan 2007 23:01:33 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 53996 invoked by uid 500); 16 Jan 2007 23:01:33 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 53987 invoked by uid 99); 16 Jan 2007 23:01:32 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jan 2007 15:01:32 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [169.229.70.167] (HELO rescomp.berkeley.edu) (169.229.70.167) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jan 2007 15:01:24 -0800 Received: by rescomp.berkeley.edu (Postfix, from userid 1007) id 38F495B77B; Tue, 16 Jan 2007 15:01:04 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by rescomp.berkeley.edu (Postfix) with ESMTP id 303417F403 for ; Tue, 16 Jan 2007 15:01:04 -0800 (PST) Date: Tue, 16 Jan 2007 15:01:04 -0800 (PST) From: Chris Hostetter To: solr-user@lucene.apache.org Subject: Re: Apostrophes in fields In-Reply-To: Message-ID: References: <1c5cd97c0701151452v223bade9q3b48cd00ba26daef@mail.gmail.com> <50f433360701151527p672155d4i68f7e032c115417f@mail.gmail.com> <1c5cd97c0701161324l861714bv768ce61ca57b579@mail.gmail.com> <3d2ce8cb0701161412r4c441a8cn340382c309e8ee1a@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org : This problem is why some sloppiness is recommended when dealing with : WordDelimiterFilter. particularly when using the generate___Parts="true" options Nick: if you want simpler matching like this, you might want to consider simplifying your definition of "text" ... if you look at the "textTight" fieldtype in the example shema (used by the field "sku") you'll see a simpler usage of WordDelimiterFilter ... alternately you may just want to use lucene's basic StandardAnalzyer ... i believe it strips Apostrophes. as a real last resort, you could use the recently added PatternReplaceFilter to strip out apostrophe's prior to WordDelimiterFilter (if you like everything WordDelim does for you except spliting on apostrophes) : - optionally index ohara at *both* "o" and "hara" then searching for "Shelley ohara memorial" fails without unless yo have slop .. if you need slop, you might as well not index it twice (not to mention it throws off the tf/idf calculations) : - pick the "alignment" based on the token position in the stream... : right-justify the catenations if it's the first token, otherwise : left-justify. One could try to identify proper names and do the : justification correctly too (blech). oh for the love of god please no. -Hoss