lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Rewig <tre...@mufin.com>
Subject Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special character behavior
Date Mon, 18 Jul 2011 10:05:35 GMT
  Hi Ian,

yes the score is identical but the inner ordering of same scores seems 
to be different in the versions.

In Lucene 3.3.0 it seems that terms with special characters will be 
preferred before the exact hit.

My code is:

     PhraseQuery query = new PhraseQuery();
         query.add(new Term("name", strQueryName));

         //topDocs = this.indexSeacher.search(query, 10);
         //topDocs = this.indexSeacher.search(query, 10, Sort.RELEVANCE);
             topDocs = this.indexSeacher.search(query, 10, Sort.INDEXORDER);

In all variants there are similar ordering problems even if they do not 
always occur at the same query.
e.g. if I order by Sort.RELEVANCE the "queen" Doc problem doesn't occur 
but there is a wrong ordering in the token aim (query name:aim)


0    Score=12,2324     Doc.Id=8060       id=709579     name=aim溝脇しほみ
1    Score=12,2324    Doc.Id=227606   id=716893      name=aim


Is there a way to guarantee the inner sorting of same scores? Or how can 
I avoid that documente with special characters have the same score as 
documente of exact matches?

Thanks in advance!
Thomas




Am 18.07.2011 10:08, schrieb Ian Lea:
> I'm not sure what you are getting at.  A search using 3.1.0 and 3.3.0
> returns the same docs with identical scores, except that one gives
> them in order A,B and the other in order B,A?  What search method are
> you using?  Does it guarantee anything about the order of returning
> docs with identical scores?
>
>
> --
> Ian.
>
>
> On Fri, Jul 15, 2011 at 3:01 PM, Thomas Rewig<trewig@mufin.com>  wrote:
>>   Hello,
>>
>> there is a index with a lot of docs, 2 of them are:
>>
>> doc1:
>>
>>     1.Field=id            ITSVopfOLB=ITS---f0-- Value= 192
>>     2.Field=name     ITSVopfOLB=ITS----0-- Value= queen
>>
>> doc2:
>>
>>     1.Field=id            ITSVopfOLB=ITS---f0-- Value= 701492
>>     2.Field=name     ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here are chinese
>> characters - hopefully you can see them)
>>
>> if I search in the index - with a TermQuery there is a different behavior
>> between Lucene 3.1.0 and 3.3.0 :
>>
>> Query:
>>
>>     Term:field='name' text='queen'
>>
>> Result Lucene 3.1.0:
>>
>>     0    Score=13,2132    Doc.Id=176002    id=192 name=queen
>>     1    Score=13,2132    Doc.Id=523407    id=701492 name=queen板野友美
>>
>> Result Lucene 3.3.0:
>>
>>     0    Score=13,2132    Doc.Id=523407    id=701492 name=queen板野友美
>>     1    Score=13,2132    Doc.Id=176002    id=192 name=queen
>>
>> The result from Lucene 3.1.0 is that, what I would expect if I do a 'exact
>> matching' Term Query.
>> Each index was indexed with its associated LuceneVersion.
>> I tested it with luke and with my own Code - the result was always the same.
>>
>> Is it a new feature in Lucene 3.3.0 or a bug?
>>
>> Thanks in advance!
>> Thomas
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message