Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9861C7474 for ; Mon, 18 Jul 2011 10:06:23 +0000 (UTC) Received: (qmail 61188 invoked by uid 500); 18 Jul 2011 10:06:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 60390 invoked by uid 500); 18 Jul 2011 10:06:13 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 60373 invoked by uid 99); 18 Jul 2011 10:06:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 10:06:10 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of trewig@mufin.com designates 195.214.216.122 as permitted sender) Received: from [195.214.216.122] (HELO mx2.de.magix.net) (195.214.216.122) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 10:06:04 +0000 Received: from [192.168.1.18] (port=36617) by mx2.de.magix.net with esmtpa (Exim 4.69) (envelope-from ) id 1Qikhq-0007Qj-KT for java-user@lucene.apache.org; Mon, 18 Jul 2011 12:05:42 +0200 Message-ID: <4E24056F.6090908@mufin.com> Date: Mon, 18 Jul 2011 12:05:35 +0200 From: Thomas Rewig Organization: mufin User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special character behavior References: <4E204844.6020502@mufin.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 192.168.1.18 X-SA-Exim-Mail-From: trewig@mufin.com X-SA-Exim-Scanned: No (on mx2.de.magix.net); SAEximRunCond expanded to false Hi Ian, yes the score is identical but the inner ordering of same scores seems to be different in the versions. In Lucene 3.3.0 it seems that terms with special characters will be preferred before the exact hit. My code is: PhraseQuery query = new PhraseQuery(); query.add(new Term("name", strQueryName)); //topDocs = this.indexSeacher.search(query, 10); //topDocs = this.indexSeacher.search(query, 10, Sort.RELEVANCE); topDocs = this.indexSeacher.search(query, 10, Sort.INDEXORDER); In all variants there are similar ordering problems even if they do not always occur at the same query. e.g. if I order by Sort.RELEVANCE the "queen" Doc problem doesn't occur but there is a wrong ordering in the token aim (query name:aim) 0 Score=12,2324 Doc.Id=8060 id=709579 name=aim溝脇しほみ 1 Score=12,2324 Doc.Id=227606 id=716893 name=aim Is there a way to guarantee the inner sorting of same scores? Or how can I avoid that documente with special characters have the same score as documente of exact matches? Thanks in advance! Thomas Am 18.07.2011 10:08, schrieb Ian Lea: > I'm not sure what you are getting at. A search using 3.1.0 and 3.3.0 > returns the same docs with identical scores, except that one gives > them in order A,B and the other in order B,A? What search method are > you using? Does it guarantee anything about the order of returning > docs with identical scores? > > > -- > Ian. > > > On Fri, Jul 15, 2011 at 3:01 PM, Thomas Rewig wrote: >> Hello, >> >> there is a index with a lot of docs, 2 of them are: >> >> doc1: >> >> 1.Field=id ITSVopfOLB=ITS---f0-- Value= 192 >> 2.Field=name ITSVopfOLB=ITS----0-- Value= queen >> >> doc2: >> >> 1.Field=id ITSVopfOLB=ITS---f0-- Value= 701492 >> 2.Field=name ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here are chinese >> characters - hopefully you can see them) >> >> if I search in the index - with a TermQuery there is a different behavior >> between Lucene 3.1.0 and 3.3.0 : >> >> Query: >> >> Term:field='name' text='queen' >> >> Result Lucene 3.1.0: >> >> 0 Score=13,2132 Doc.Id=176002 id=192 name=queen >> 1 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美 >> >> Result Lucene 3.3.0: >> >> 0 Score=13,2132 Doc.Id=523407 id=701492 name=queen板野友美 >> 1 Score=13,2132 Doc.Id=176002 id=192 name=queen >> >> The result from Lucene 3.1.0 is that, what I would expect if I do a 'exact >> matching' Term Query. >> Each index was indexed with its associated LuceneVersion. >> I tested it with luke and with my own Code - the result was always the same. >> >> Is it a new feature in Lucene 3.3.0 or a bug? >> >> Thanks in advance! >> Thomas >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org