Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC31AE356 for ; Mon, 25 Feb 2013 10:25:20 +0000 (UTC) Received: (qmail 51264 invoked by uid 500); 25 Feb 2013 10:25:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 51226 invoked by uid 500); 25 Feb 2013 10:25:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 51207 invoked by uid 99); 25 Feb 2013 10:25:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 10:25:18 +0000 X-ASF-Spam-Status: No, hits=-0.2 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLYTO_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul_t100@fastmail.fm designates 66.111.4.25 as permitted sender) Received: from [66.111.4.25] (HELO out1-smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 10:25:10 +0000 Received: from compute2.internal (compute2.nyi.mail.srv.osa [10.202.2.42]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 9979020F49 for ; Mon, 25 Feb 2013 05:24:49 -0500 (EST) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute2.internal (MEProxy); Mon, 25 Feb 2013 05:24:49 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.fm; h= message-id:date:from:reply-to:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; s=mesmtp; bh=QbAmnpmc8Fzcpk/ubGJ//XYd2rQ=; b=ihGxge6d1ma8opkRpgBkgrDhtPt+ 5+ZNhes7+VomB5l3DHOqVEpn9nxlMoMRwNSDkEFPHCQlbi5RgJ/Rb+SJq46Vl1/q DCxlDlQjEwbvcNO6iSXCf372KbQvIrAiVzdh6Gt3br3+ZZmWSPLnHh/uHUYncF0T XfpRDh5jyFIANxA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:date:from:reply-to :mime-version:to:subject:references:in-reply-to:content-type :content-transfer-encoding; s=smtpout; bh=QbAmnpmc8Fzcpk/ubGJ//X Yd2rQ=; b=THzbIvpW/j2RxVKMdzQdfM2GE8fFTKmux6ApeorRhtPNJC0sG76y6X zzpqN83D2nbnwG2ECZ2jl5fNtsxKGQOxHgtH9Yc3p2pqJJrmsBLf2tQFeQ8rDeji kxPBPoBuf08AjHYWpWOoGLESPmV48xrHAoszswX6KScXPtH+VVQoI= X-Sasl-enc: EQZF+7Wh7cp037erEMWGbU7TblqDwZnOyAmReBw4rqvr 1361787889 Received: from [192.168.1.66] (unknown [217.155.98.246]) by mail.messagingengine.com (Postfix) with ESMTPA id 08E218E012E for ; Mon, 25 Feb 2013 05:24:48 -0500 (EST) Message-ID: <512B3BF0.7030401@fastmail.fm> Date: Mon, 25 Feb 2013 10:24:48 +0000 From: Paul Taylor Reply-To: paul_t100@fastmail.fm User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130216 Thunderbird/17.0.3 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1 References: <5124B363.6010101@fastmail.fm> In-Reply-To: <5124B363.6010101@fastmail.fm> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 20/02/2013 11:28, Paul Taylor wrote: > Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my > tests that use NormalizeCharMap for replacing characters in the > anyalzers are not working. > bump, anybody I thought a self contained testcase would be enough to pique somebodys interest, am I doing something silly - maybe but I can't see it Paul > Below Ive created a self-contained test case, this is the output when > I run it > > > --term=and-- > --term=gold-- > --term=platinum-- > name:"platinum and gold" > Size1 > name:"platinum & gold" > Size0 > > java.lang.AssertionError: > Expected :1 > Actual :0 > > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at org.junit.Assert.assertEquals(Assert.java:456) > at > org.musicbrainz.search.analysis.Lucene41CharFilterTest.testAmpersandSearching(Lucene41CharFilterTest.java:89) > > As you can see the charfilter does seem to work because the the text > 'platinum & gold' is converted to three terms 'platnum, and , gold'. > In fact search is working for 'platinum and gold' but not working for > the original "platinum & gold" even though both index and search using > same analyzer. Maybe the problem is with the query parser, but its > certainly related to 4.1 because worked previously. > > thanks Paul > > > package org.musicbrainz.search.analysis; > > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.Tokenizer; > import org.apache.lucene.analysis.charfilter.MappingCharFilter; > import org.apache.lucene.analysis.charfilter.NormalizeCharMap; > import org.apache.lucene.analysis.core.LowerCaseFilter; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.index.*; > import org.apache.lucene.queryparser.classic.QueryParser; > import org.apache.lucene.search.IndexSearcher; > import org.apache.lucene.search.Query; > import org.apache.lucene.search.TopDocs; > import org.apache.lucene.store.RAMDirectory; > import org.apache.lucene.util.BytesRef; > import org.apache.lucene.util.Version; > import org.junit.Test; > import java.io.Reader; > > import static org.junit.Assert.assertEquals; > > public class Lucene41CharFilterTest > { > class SimpleAnalyzer extends Analyzer { > > protected NormalizeCharMap charConvertMap; > > protected void setCharConvertMap() { > > NormalizeCharMap.Builder builder = new > NormalizeCharMap.Builder(); > builder.add("&","and"); > charConvertMap = builder.build(); > } > > public SimpleAnalyzer() { > setCharConvertMap(); > } > > @Override > protected TokenStreamComponents createComponents(String > fieldName, Reader reader) { > Tokenizer source = new > MusicbrainzTokenizer(Version.LUCENE_41, > new MappingCharFilter(charConvertMap, reader)); > TokenStream filter = new > LowerCaseFilter(Version.LUCENE_41,source); > return new TokenStreamComponents(source, filter); > } > } > > @Test > public void testAmpersandSearching() throws Exception { > > Analyzer analyzer = new SimpleAnalyzer(); > RAMDirectory dir = new RAMDirectory(); > IndexWriterConfig writerConfig = new > IndexWriterConfig(Version.LUCENE_41,analyzer); > IndexWriter writer = new IndexWriter(dir, writerConfig); > { > Document doc = new Document(); > doc.add(new Field("name", "platinum & gold", > Field.Store.YES, Field.Index.ANALYZED)); > writer.addDocument(doc); > } > writer.close(); > > IndexReader ir = DirectoryReader.open(dir); > Fields fields = MultiFields.getFields(ir); > Terms terms = fields.terms("name"); > TermsEnum termsEnum = terms.iterator(null); > BytesRef text; > while((text = termsEnum.next()) != null) { > System.out.println("--term=" + text.utf8ToString()+"--"); > } > ir.close(); > > IndexSearcher searcher = new > IndexSearcher(IndexReader.open(dir)); > { > Query q = new QueryParser(Version.LUCENE_41, "name", > analyzer).parse("\"platinum and gold\""); > System.out.println(q); > TopDocs td = searcher.search(q, 10); > System.out.println("Size"+td.scoreDocs.length); > assertEquals(1, searcher.search(q, 10).totalHits); > } > > searcher = new IndexSearcher(IndexReader.open(dir)); > { > Query q = new QueryParser(Version.LUCENE_41, "name", > analyzer).parse("\"platinum & gold\""); > System.out.println(q); > TopDocs td = searcher.search(q, 10); > System.out.println("Size"+td.scoreDocs.length); > assertEquals(1, searcher.search(q, 10).totalHits); > } > } > } > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org