lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Top matched data should be on Top
Date Tue, 14 Feb 2012 18:24:54 GMT
You cannot simply count words like this and expect the docs to be ordered
as you imply. The problem is that the lengths of the fields are encoded
an a byte (or perhaps an int, I forget). Thus, some loss of precision
is inherent in the process. You have to encode values from 1 to 2^31
or so in something that's not a long.

So try attaching &debugQuery=on and examining the output, you'll probably
see that the scores are identical, in which case Solr breaks the ties by
document insertion order (roughly). And looking closely at the debug
information, I suspect you'll see that the length normalization is
the same.

Best
Erick

On Tue, Feb 14, 2012 at 1:16 PM, A Z <4azfriend@gmail.com> wrote:
> Hi ,
>
> when i m adding three document i m not getting top mathced text on the top
> , but when i have only two document then it displaying properly as shown in
> follwoing text
>
> i m using default similarit only and lucene3.1 version
> *adding following document *
>
>         * writer.addDocument(createDocument("Doc1", "pt carrefour
> indonesia temp price reduct advertising promotion disc reg"));
>
>           writer.addDocument(createDocument("Doc2", "pt carrefour
> indonesia temp price reduct advertising promotion reg disc april"));
> *
> if i un comment Doc3 and search same string  i will get Doc1  as top but
> when i comment document 3 then i will get Doc2 on top
> and what i want is irrespective of number of document top mathced document
> sholud be on top so here Doc2 is document which has maximum text is
> matching as in doc2 april is word which is extra as compare to Doc1 so Doc2
> should always be on TOP
> *//         writer.addDocument(createDocument("Doc3","qrst opq april"));
> // document 3 *
>
>
> *searching with follwing text*
> *"pt carrefour indonesia temp price reduct advertising promotion anchr reg
> disc april"*
>
> *When we adding two document only[Doc1 ,Doc2]*
> *output is *
> Query (content:pt content:carrefour content:indonesia content:temp
> content:price content:reduct content:advertising content:promotion
> content:anchr content:reg content:disc content:april)
> title  ->Doc2:::
> content -> pt carrefour indonesia temp price reduct advertising promotion
> reg disc *april*::: *Score ->0.381982
> *title  ->Doc1:::
> content -> pt carrefour indonesia temp price reduct advertising promotion
> disc reg::: *Score ->0.33834878*
>
> *When we adding three document only[Doc1 ,Doc2,Doc3]*
> *output is *
> when adding third document
> Query (content:pt content:carrefour content:indonesia content:temp
> content:price content:reduct content:advertising content:promotion
> content:anchr content:reg content:disc content:april)
> title  ->Doc1:::
> content -> pt carrefour indonesia temp price reduct advertising promotion
> disc reg::: *Score ->0.6635133
> *title  ->Doc2:::
> content -> pt carrefour indonesia temp price reduct advertising promotion
> reg disc *april*::: *Score ->0.6422809*
> title  ->Doc3:::
> content -> qrst opq april::: Score ->0.010616212
>
>
>
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message