Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 77328 invoked from network); 29 Jan 2011 11:09:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Jan 2011 11:09:58 -0000 Received: (qmail 90429 invoked by uid 500); 29 Jan 2011 11:09:56 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 89959 invoked by uid 500); 29 Jan 2011 11:09:52 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 89951 invoked by uid 99); 29 Jan 2011 11:09:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jan 2011 11:09:51 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul_t100@fastmail.fm designates 66.111.4.25 as permitted sender) Received: from [66.111.4.25] (HELO out1.smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jan 2011 11:09:44 +0000 Received: from compute2.internal (compute2.nyi.mail.srv.osa [10.202.2.42]) by gateway1.messagingengine.com (Postfix) with ESMTP id 5AD5720591; Sat, 29 Jan 2011 06:09:23 -0500 (EST) Received: from frontend2.messagingengine.com ([10.202.2.161]) by compute2.internal (MEProxy); Sat, 29 Jan 2011 06:09:23 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=message-id:date:from:reply-to:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; s=smtpout; bh=L51dnl+6gsP5r3vmgzixvm9wOcw=; b=CuAQQS6tahisSQfZbmBopyOPcolu/nRQLOKVIgLfL5qB9kW2lqjYbTzZWKBph50NgE24X9hauZIpaaS5w3/kFdJiuUiSo00KJvc0CO6KxBEzNJuPS2+aIAEKTZgcT5dWEDGtxCGahZI4UYKhFTvlRzVAQOPfZ2Fp90AjaHRHnyU= X-Sasl-enc: ha0eDxF44v2tNjKIg7W2ye97RxE4v6IoPs0JlGOJG333 1296299362 Received: from macbook.lan (unknown [217.155.98.246]) by mail.messagingengine.com (Postfix) with ESMTPA id 9376B441534; Sat, 29 Jan 2011 06:09:22 -0500 (EST) Message-ID: <4D43F561.60000@fastmail.fm> Date: Sat, 29 Jan 2011 11:09:21 +0000 From: Paul Taylor Reply-To: paul_t100@fastmail.fm User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: java-user@lucene.apache.org CC: Koji Sekiguchi Subject: Re: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch References: <4D383669.1090003@fastmail.fm> <4D3AFB2D.5070505@r.email.ne.jp> <4D3DB37E.90700@fastmail.fm> <4D43712A.2000800@r.email.ne.jp> In-Reply-To: <4D43712A.2000800@r.email.ne.jp> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 29/01/2011 01:45, Koji Sekiguchi wrote: > (11/01/25 2:14), Paul Taylor wrote: >> On 22/01/2011 15:43, Koji Sekiguchi wrote: >>> (11/01/20 22:19), Paul Taylor wrote: >>>> Trying to extend MappingCharFilter so that it only changes a token >>>> if the length of the token >>>> matches the length of singleMatch in NormalizeCharMap (currently >>>> the singleMatch just has to be >>>> found in the token I want ut to match the whole token). Can this be >>>> done it sounds simple enough but >>>> I cannot make any headway understanding the MappingCharFilter >>>> source code >>>> >>>> thanks Paul >>> >>> Paul, >>> >>> Can you give us a concrete input/output (you wanted) with mapping table >>> so that I can understand what you want? >>> >>> Thanks, >>> >>> Koji >> Sure >> >> charConvertMap.add("!!!","ApostropheApostropheApostrophe"); >> charConvertMap.add("*** ***","StarStarStar"); >> charConvertMap.add("!","Apostrophe"); >> >> Normally, punctuation gets removed during index and searching which >> is what I want for good search >> results but when the token only contains specific punctuation strings >> I don't want to remove the >> punctuation because it would make it impossible to match, so I >> convert it to a textual representation. >> >> As it stands in the 3rd case '!' will be preserved wherever it is >> found, so to get a good match on >> 'Wow!' you would have to search for 'Wow!. But I want you to be able >> to search for 'Wow' and it >> return 'Wow!' which is the case if "!" isn't in the char convert map, >> but if you searched for '!' I >> want it to return the token which is just '!' which is only the case >> if the value is added to the map. >> >> I need to do this because the text we are indexing and searching are >> short strings representing an >> music artist name (there is an artist called !!!) >> >> thanks Paul >> >> > Hi Paul, > > Still I'm not sure I understand your issue correctly, but if you want: > > query="Wow!" result="Wow!" > query="Wow" result="Wow!" > query="!" result="Wow!" > query="!!!" result="!!!" > > does the following maps solve your problem? > (I assume you use Whitespace-type-Tokenizer here) > > charConvertMap.add("!!!","ApostropheApostropheApostrophe"); > charConvertMap.add("!"," Apostrophe"); // there is a space in front > of "!" > > Koji No, the list of names your solution would convert all cases of apostrophe which is not what I want to, and I need to do this for is much larger than the two examples I give here,so you cannot rely on the order they are added in. Is it possible to help me with the original question, how do I subclass MaapingCharFilter so that it only changes complete matching tokens. i.e if my charconvertmap contained charConvertMap.add("!!!","ApostropheApostropheApostrophe"); it would convert a token of !!! to 'ApostropheApostropheApostrophe' but a token of 'Hello!!!' becomes Hello Paul --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org