lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balaji.A" <reachbalaj...@gmail.com>
Subject Scoring Pattern for partial and exact match search results
Date Wed, 27 Oct 2010 14:36:02 GMT

Hi,

I have 6 fields in a document with respective data types given below.

field name       data type
------------------------
content        text
title             text
description    text
content_em   text_ws
title_em        text_ws
description    text_ws

My requirement is to prioritize search results based on exact and partial
match conditions. Document that have exact match should have high score than
documents with partial match.

To achieve this problem I have added 3 fields
(content_em,title_em,description_em) which contains the same content of
content,title and description respectively.

My dismax query is something similar to this

mm=1&qf=content^100+description^200+title^300&pf=content_em^500000+description_em^700000+title_em^900000&fl=id&start=0&q=London&qt=dismax

I have 2 problems with this approach:

Problem 1:

For instance if doc1 has London text appearing 1 time in description,
content and title fields and doc2 has 
same text appearing 1 time only in description and content field, doc2 gives
me high score than doc1. Can anyone explain why this happens? Since I give
more boost to title field, I expect term matching that field should be given
more score.


Problem 2

Another scenario is having a search term "Ryder Cup".
Doc 1 has text "Cup" appearing 20 or more times in content field
Doc 2 has text "Ryder Cup" appearing 1 time in title field

On search I expect Doc 2 to be on top since I want exact match documents to
be prioritized over partial match documents. But unfortunatly Doc 1 comes on
top with more scoring.

Since I am new to Lucene, can anyone help me to solve these problem?

Many Thanks,
Balaji.

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Scoring-Pattern-for-partial-and-exact-match-search-results-tp1780478p1780478.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Mime
View raw message