Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Date: Tue, 16 Jan 2007 15:01:04 -0800 (PST)
From: Chris Hostetter <hossman_lucene@fucit.org>
To: solr-user@lucene.apache.org
Subject: Re: Apostrophes in fields
In-Reply-To: <c68e39170701161426v341f245fqef62aed6d10d7e61@mail.gmail.com>
Message-ID: <Pine.LNX.4.58.0701161454440.10805@hal.rescomp.berkeley.edu>
References: <1c5cd97c0701151452v223bade9q3b48cd00ba26daef@mail.gmail.com>
 <50f433360701151527p672155d4i68f7e032c115417f@mail.gmail.com>
 <f767f0600701152243y579646fbx48764d430e4f2204@mail.gmail.com>
 <1c5cd97c0701161324l861714bv768ce61ca57b579@mail.gmail.com>
 <3d2ce8cb0701161412r4c441a8cn340382c309e8ee1a@mail.gmail.com>
 <c68e39170701161426v341f245fqef62aed6d10d7e61@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


: This problem is why some sloppiness is recommended when dealing with
: WordDelimiterFilter.

particularly when using the generate___Parts="true" options

Nick: if you want simpler matching like this, you might want to consider
simplifying your definition of "text" ... if you look at the "textTight"
fieldtype in the example shema (used by the field "sku") you'll see a
simpler usage of WordDelimiterFilter ... alternately you may just want to
use lucene's basic StandardAnalzyer ... i believe it strips Apostrophes.

as a real last resort, you could use the recently added
PatternReplaceFilter to strip out apostrophe's prior to
WordDelimiterFilter (if you like everything WordDelim does for you except
spliting on apostrophes)

:   - optionally index ohara at *both* "o" and "hara"

then searching for "Shelley ohara memorial" fails without unless yo have
slop .. if you need slop, you might as well not index it twice (not to
mention it throws off the tf/idf calculations)

:   - pick the "alignment" based on the token position in the stream...
: right-justify the catenations if it's the first token, otherwise
: left-justify.  One could try to identify proper names and do the
: justification correctly too (blech).

oh for the love of god please no.


-Hoss