lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharat Jain <bharat.j...@gmail.com>
Subject Re: question about relevance
Date Fri, 30 Jul 2010 14:40:19 GMT
Hi,
   Thanks a lot for the info and your time. I think field collapse will work
for us. I looked at the https://issues.apache.org/jira/browse/SOLR-236 but
which file I should use for patch. We use solr-1.3.

Thanks
Bharat Jain


On Fri, Jul 30, 2010 at 12:53 AM, Chris Hostetter
<hossman_lucene@fucit.org>wrote:

>
> : 1. There are user records of type A, B, C etc. (userId field in index is
> : common to all records)
> : 2. A user can have any number of A, B, C etc (e.g. think of A being a
> : language then user can know many languages like french, english, german
> etc)
> : 3. Records are currently stored as a document in index.
> : 4. A given query can match multiple records for the user
> : 5. If for a user more records are matched (e.g. if he knows both french
> and
> : german) then he is more relevant and should come top in UI. This is the
> : reason I wanted to add lucene scores assuming the greater score means
> more
> : relevance.
>
> if your goal is to get back "users" from each search, then you should
> probably change your indexing strategry so that each "user" has a single
> document -- fields like "langauge" can be multivalued, etc...
>
> then a search for "language:en langauge:fr" will return users who speak
> english or french, and hte ones that speak both will score higher.
>
> if you really cant change the index structure, then essentially waht you
> are looking for is a "field collapsing" solution on the userId field,
> where you want each collapsed group to get a cumulative score.  i don't
> know if the existing field collapsing patches support this -- if you are
> already willing/capable to do it in the lcient then that may be the
> simplest thing to support moving foward.
>
> Adding the scores is certainly one metric you could use -- it's generally
> suspicious to try and imply too much meaning to scores in lucene/solr but
> that's becuase people typically try to imply broader absolute meaning.  in
> the case of a single query the scores are relative eachother, and adding
> up all the scores for a given userId is approximaly what would happen in
> my example above -- except that there is also a "coord" factor that would
> penalalize documents that only match one clause ... it's complicated, but
> as an approximation adding the scores might give you what you are looking
> for -- only you can know for sure based on your specific data.
>
>
>
> -Hoss
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message