Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 22836 invoked from network); 7 Dec 2009 20:48:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Dec 2009 20:48:48 -0000 Received: (qmail 25614 invoked by uid 500); 7 Dec 2009 20:48:46 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 25510 invoked by uid 500); 7 Dec 2009 20:48:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 25500 invoked by uid 99); 7 Dec 2009 20:48:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 20:48:45 +0000 X-ASF-Spam-Status: No, hits=-3.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul_t100@fastmail.fm designates 66.111.4.25 as permitted sender) Received: from [66.111.4.25] (HELO out1.smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 20:48:43 +0000 Received: from compute2.internal (compute2.internal [10.202.2.42]) by gateway1.messagingengine.com (Postfix) with ESMTP id 2C193C5955 for ; Mon, 7 Dec 2009 15:48:22 -0500 (EST) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute2.internal (MEProxy); Mon, 07 Dec 2009 15:48:22 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=message-id:date:from:reply-to:mime-version:to:subject:content-type:content-transfer-encoding; s=smtpout; bh=HrlEbDVett6t7RTncBMZJN0chrU=; b=hlrP6N3/yMuqcb0mIvu4M5YxOUccgGytGFKmY32hGncOA3+2tI/L0wsvz17j7PWoutU/gYBWaYjGycODTxMfOYLj9TH2XggA0fjenhFwyvZabWhfiSoOWL18H+D46x/4BPcWpqE5A+IJNtkYZ7wIlwlijdA40X3pPGXmkMDtEhA= X-Sasl-enc: PswZ4joQotmt4OU4W4Yo1SQn3/lMF6wpiU6CSiYHoZ4j 1260218901 Received: from macbook.lan (unknown [217.155.98.246]) by mail.messagingengine.com (Postfix) with ESMTPA id AD8F01A137 for ; Mon, 7 Dec 2009 15:48:21 -0500 (EST) Message-ID: <4B1D6A14.8020304@fastmail.fm> Date: Mon, 07 Dec 2009 20:48:20 +0000 From: Paul Taylor Reply-To: paul_t100@fastmail.fm User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Lucene Users Subject: Looking for a MappingCharFilter that accepts regular expressions Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I want my search to treat 'No. 1' and 'No.1' the same, because in our context its one token I want 'No. 1' to become 'No.1', I need to do this before tokenizing because the tokenizer would split one value into two terms and one into just one term. I already use a NormalizeMapFilter to map &' to 'and' but I think it only takes literal text and I need to 1. be case insensitive (but lowercasefilter is only applied after tokenizing) 2. cope with all numbers e.g no. 109 So I was going to subclass BaseCharFilter and do my matches with a regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling to understand the offset methods you have to do once you get a match. Has anyone already got a regular expression Charfilter OR am I approaching this all wrong thanks Paul --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org