Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 5960 invoked from network); 22 Sep 2005 04:15:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 22 Sep 2005 04:15:24 -0000 Received: (qmail 26399 invoked by uid 500); 22 Sep 2005 04:15:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 26365 invoked by uid 500); 22 Sep 2005 04:15:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 26338 invoked by uid 99); 22 Sep 2005 04:15:19 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2005 21:15:19 -0700 X-ASF-Spam-Status: No, hits=0.3 required=10.0 tests=HTML_10_20,HTML_MESSAGE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of jeff.rodenburg@gmail.com designates 64.233.162.195 as permitted sender) Received: from [64.233.162.195] (HELO zproxy.gmail.com) (64.233.162.195) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2005 21:15:27 -0700 Received: by zproxy.gmail.com with SMTP id q3so84858nzb for ; Wed, 21 Sep 2005 21:14:57 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type; b=SJVg0pgC77QjI2jNzJJ3wI6zgTbo2mRtRZLj+V3FYt2wfYJ2eyzlwPT3xmXnmrr8L7S/bN8aAjysDLwJZhWnnKM4H+sb5j0+1hll1pQeqvZdJy2M0/BjGtULm2iuzK4LIkPXBLAC8RbKiRqrfDvbhVZpPL6rj5YwRfrAxfmX7M8= Received: by 10.54.40.56 with SMTP id n56mr2458865wrn; Wed, 21 Sep 2005 21:14:57 -0700 (PDT) Received: by 10.54.13.32 with HTTP; Wed, 21 Sep 2005 21:14:57 -0700 (PDT) Message-ID: <50f433360509212114242cf42e@mail.gmail.com> Date: Wed, 21 Sep 2005 21:14:57 -0700 From: Jeff Rodenburg Reply-To: Jeff Rodenburg To: java-user@lucene.apache.org Subject: Suggestions for analysis MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_987_25595597.1127362497391" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_987_25595597.1127362497391 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline I'm looking for some suggestions on an analyzer decision. I've got my own thoughts to this already, but would like some initial feedback on it first. The scenario: - An index of geographic information: cities, towns, states, neighborhoods, zipcodes, generic names, etc. Examples are "New York, NY"= , "New York", "Midtown", "10012", "The Big Apple". - I have these mapped to underlying geographic data points: census data, postal data, mapping data, etc. - I want some of these to carry more precedence than others when conflicting/matching terms exist, i.e. "Washington" should score Washington D.C. higher than the state of Washington. This would be decided on an item-by-item basis, and not dictated by one broad field. - I need the right mix for searches to work as I expect. As an example, a search for "Wedgewood WA" would ideally not match "Wedgewood = GA". I'm starting with the StandardAnalyzer and thinking of possibly extending i= t to carry in some of the business rules meant to come into play for tie-breakers. Comments appreciated. Thanks, jeff r. ------=_Part_987_25595597.1127362497391--