lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Top matched data should be on Top
Date Mon, 20 Feb 2012 17:27:37 GMT
Your example is hard to follow - too many words in the query and the
docs.  Have you looked at the output from IndexSearcher.explain()?  If
you don't like how lucene is scoring things you can write your own
implementation of Similarity.


--
Ian.


On Sun, Feb 19, 2012 at 5:08 AM, A Z <4azfriend@gmail.com> wrote:
> hi
>
> thanks for your reply,
>
> but, if  i add one extra word *[abc]* in all three document and then i try
> to search string i  m getting top matched document on top which is not case
> when i removed abc from all the document and search string.
>
> So here i m getting doc2 which has maximum word matched when *abc* is added
>
> *Query (content:pt content:carrefour content:indonesia content:temp
> content:price content:reduct content:advertising content:promotion
> content:anchr content:reg content:disc content:april content:abc)*
>
>
> *title  ->Doc2:::*
> content -> pt carrefour indonesia temp price reduct advertising promotion
> reg disc april  abc::: Score ->*0.6657306*
> *title  ->Doc1:::*
> content -> pt carrefour indonesia temp price reduct advertising promotion
> disc reg abc::: Score ->*0.55722165*
> *title  ->Doc3:::*
> content -> qrst opq april  abc::: Score ->*0.029068843*
>
> so my concern is that maximum matched word in document should be on top,
> when there is two document which has same number of word matched  then it
> should go for minimum length document on top other wise it should give top
> matched word in document on top.
>
>
> On Tue, Feb 14, 2012 at 11:54 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> You cannot simply count words like this and expect the docs to be ordered
>> as you imply. The problem is that the lengths of the fields are encoded
>> an a byte (or perhaps an int, I forget). Thus, some loss of precision
>> is inherent in the process. You have to encode values from 1 to 2^31
>> or so in something that's not a long.
>>
>> So try attaching &debugQuery=on and examining the output, you'll probably
>> see that the scores are identical, in which case Solr breaks the ties by
>> document insertion order (roughly). And looking closely at the debug
>> information, I suspect you'll see that the length normalization is
>> the same.
>>
>> Best
>> Erick
>>
>> On Tue, Feb 14, 2012 at 1:16 PM, A Z <4azfriend@gmail.com> wrote:
>> > Hi ,
>> >
>> > when i m adding three document i m not getting top mathced text on the
>> top
>> > , but when i have only two document then it displaying properly as shown
>> in
>> > follwoing text
>> >
>> > i m using default similarit only and lucene3.1 version
>> > *adding following document *
>> >
>> >         * writer.addDocument(createDocument("Doc1", "pt carrefour
>> > indonesia temp price reduct advertising promotion disc reg"));
>> >
>> >           writer.addDocument(createDocument("Doc2", "pt carrefour
>> > indonesia temp price reduct advertising promotion reg disc april"));
>> > *
>> > if i un comment Doc3 and search same string  i will get Doc1  as top but
>> > when i comment document 3 then i will get Doc2 on top
>> > and what i want is irrespective of number of document top mathced
>> document
>> > sholud be on top so here Doc2 is document which has maximum text is
>> > matching as in doc2 april is word which is extra as compare to Doc1 so
>> Doc2
>> > should always be on TOP
>> > *//         writer.addDocument(createDocument("Doc3","qrst opq april"));
>> > // document 3 *
>> >
>> >
>> > *searching with follwing text*
>> > *"pt carrefour indonesia temp price reduct advertising promotion anchr
>> reg
>> > disc april"*
>> >
>> > *When we adding two document only[Doc1 ,Doc2]*
>> > *output is *
>> > Query (content:pt content:carrefour content:indonesia content:temp
>> > content:price content:reduct content:advertising content:promotion
>> > content:anchr content:reg content:disc content:april)
>> > title  ->Doc2:::
>> > content -> pt carrefour indonesia temp price reduct advertising promotion
>> > reg disc *april*::: *Score ->0.381982
>> > *title  ->Doc1:::
>> > content -> pt carrefour indonesia temp price reduct advertising promotion
>> > disc reg::: *Score ->0.33834878*
>> >
>> > *When we adding three document only[Doc1 ,Doc2,Doc3]*
>> > *output is *
>> > when adding third document
>> > Query (content:pt content:carrefour content:indonesia content:temp
>> > content:price content:reduct content:advertising content:promotion
>> > content:anchr content:reg content:disc content:april)
>> * > title  ->Doc1:::
>> > content -> pt carrefour indonesia temp price reduct advertising promotion
>> *
>> *> disc reg::: *Score ->0.6635133
>> > *title  ->Doc2:::
>> *
>> *> content -> pt carrefour indonesia temp price reduct advertising
>> promotion
>> *
>> *> reg disc *april*::: *Score ->0.6422809*
>> *
>> *> title  ->Doc3:::
>> > content -> qrst opq april::: Score ->0.010616212*
>> >
>> >
>> >
>> > Thanks
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message