lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: question regarding dismax query results
Date Tue, 31 Dec 2013 13:06:55 GMT
Hi Vulcanoid,

If you want to consider proximity, you need to use pf (phrase fields) and ps (phrase slop)
parameter. Please see :

http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_search_for_one_term_near_another_term_.28say.2C_.22batman.22_and_.22movie.22.29


P.S. edismax has more fine grained control over this via pf2 pf3 parameters.


On Tuesday, December 31, 2013 12:36 PM, Vulcanoid Developer <vulcanoid@vulcantechsoftware.com>
wrote:
Hi,

I have a solr schema which has fields related to Indian legal judgments and
want to provide a search engine on top of them.  I came across a problem
which I thought I would take the group's advise on.

For discussion sake let us assume there are only two fields "assessee" and
"itat_order" which are text fields; the latter has the entire judgment of
the court in text form.

Now I search using dismax against these 2 fields using a query like below

http://localhost:8983/solr/itat/select?q=additional+depreciation&start=20&rows=30&fl=assessee%2C+itat_order%2C+score&wt=xml&indent=true&defType=dismax&qf=assessee<http://techgaruda.com:8983/solr/itat/select?q=additional+depreciation&start=20&rows=30&fl=assessee%2C+itat_order%2C+score&wt=xml&indent=true&defType=dismax&qf=assessee>
^0.3+itat_order^0.2


For such a dismax query, the words additional depreciation (2 words without
quotes), we get results with additional and depreciation separately
occurring having higher score than results which have the words additional
depreciation occurring immediately together.  Why does this happen?

Shouldn't we ideally be getting exact matches of additional depreciation
first and then matches which have both the words but apart from each other
after these exact matches?  (In general when I search for A B shouldn't I
get matches with A B as they appear first and then A and B separated by
distance or singly occuring?)

Below I have pasted the score and # of occurences given for three results;
if you want I can share the text fields in these cases too.

(Also, for what its worth, the solr index uses only a
whitespacerfilterfactory and lowercasefilterfactory for querying and
indexing)

thanks
Vulcanoid

"""
decision of Heatshrink Technologies :
       score                          : 0.083743244
       additional depreciation  : 0 occurrence
         additional                     : 2 occurrences
         depreciation                 : 27 occurrences


decision of   Srinivasa Raju
       score                          : 0.08313061
       additional depreciation  : 0 occurrences
         additional                     : 5 occurrences
         depreciation                 : 30 occurrences


decision of     Nani Agro Foods
       score                          : 0.08217349
       additional depreciation  : 5 occurrences
         additional                     : 5 occurrences
         depreciation                 : 5 occurrences
""" 

Mime
View raw message