Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 92847 invoked from network); 15 Nov 2009 21:59:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Nov 2009 21:59:06 -0000 Received: (qmail 6708 invoked by uid 500); 15 Nov 2009 21:59:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 6625 invoked by uid 500); 15 Nov 2009 21:59:03 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 6615 invoked by uid 99); 15 Nov 2009 21:59:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Nov 2009 21:59:03 +0000 X-ASF-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of scott_ribe@killerbytes.com designates 69.17.117.3 as permitted sender) Received: from [69.17.117.3] (HELO mail1.sea5.speakeasy.net) (69.17.117.3) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Nov 2009 21:59:00 +0000 Received: (qmail 1401 invoked from network); 15 Nov 2009 21:58:38 -0000 Received: from 25-76-42-72.skybeam.com (HELO [192.168.2.16]) (scott_ribe@killerbytes.com@[72.42.76.25]) (envelope-sender ) by mail1.sea5.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 15 Nov 2009 21:58:38 -0000 User-Agent: Microsoft-Entourage/12.20.0.090605 Date: Sun, 15 Nov 2009 14:58:38 -0700 Subject: Polishing up my Lucene integration, customizing analyzer From: Scott Ribe To: Message-ID: Thread-Topic: Polishing up my Lucene integration, customizing analyzer Thread-Index: AcpmPskEghxxFB43/k+U1YHA7jMtaw== Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit I bought the original Lucene in Action, read it, set up integration with my system--a small Java daemon that monitors db for changes & updates the index, and listens for queries and processes them... Now I'd like to customize query parsing to better fit the particular application and users. I'm thinking I need a customized analyzer: - Handles email addresses, acronyms, etc the way StandardAnalyzer does. - Turns stop words into Nutch-style bigrams. - Defaults to "AND" instead of "OR". - Defaults to in-order phrase queries instead of unordered proximities. A lot has changed since 2004, as you guys know ;-) So I waded through release notes & docs and found many of the differences that mattered for my use and got it working with 2.9.0. But I'm a bit lost as to how to get that combination of features in an analyzer--obviously a couple of them are simple settings to StandardAnalyzer, but not all, particularly those first two items... Any hints or directions appreciated. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org