lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franck Brisbart <fbrisb...@techmedianetwork.com>
Subject RE: Interesting search question! How to match documents based on the least number of fields that match all query terms?
Date Thu, 23 Jan 2014 08:12:36 GMT
Hi Daniel,

you can also consider using negative boosts.
This can't be done with solr, but docs which don't match the metadata
can be boosted.

This might do what you want :
-metadata1:(term1 AND ... AND termN)^2
-metadata2:(term1 AND ... AND termN)^2
.....
-metadataN:(term1 AND ... AND termN)^2
allMetadatas :(term1 AND ... AND termN)^0.5


Franck Brisbart



Le mercredi 22 janvier 2014 à 19:38 +0000, Petersen, Robert a écrit :
> Hi Daniel,
> 
> How about trying something like this (you'll have to play with the boosts to tune this),
search all the fields with all the terms using edismax and use the minimum should match parameter,
but require all terms to match in the allMetadata field.    https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29
> 
> Lucene query syntax below to give you the general idea, but this query would require
all terms to be in one of the metadata fields to get the boost.
> 
> metadata1:(term1 AND ... AND termN)^2
> metadata2:(term1 AND ... AND termN)^2
> .....
> metadataN:(term1 AND ... AND termN)^2
> allMetadatas :(term1 AND ... AND termN)^0.5
> 
> That should do approximately what you want,
> Robi
> 
> -----Original Message-----
> From: Daniel Shane [mailto:shaned@lexum.com] 
> Sent: Tuesday, January 21, 2014 8:42 AM
> To: solr-user@lucene.apache.org
> Subject: Interesting search question! How to match documents based on the least number
of fields that match all query terms?
> 
> I have an interesting solr/lucene question and its quite possible that some new features
in solr might make this much easier that what I am about to try. If anyone has a clever idea
on how to do this search, please let me know!
> 
> Basically, lets state that I have an index in which each documents has a content and
several metadata fields.
> 
> Document Fields:
> 
> content
> metadata1
> metadata2
> .....
> metadataN
> allMetadatas (all the terms indexed in metadata1...N are concatenated in this field)

> 
> Assuming that I am searching for documents that contains a certain number of terms (term1
to termN) in their metadata fields, I would like to build a search query that will return
document that satisfy these requirement:
> 
> a) All search terms must be present in a metadata field. This is quite easy, we can simply
search in the field allMetadatas and that will work fine.
> 
> b) Now for the hard part, we prefer document in which we found the metadatas in the *least
number of different fields*. So if one document contains all the search terms in 10 different
fields, but another document contains all search terms but in only 8 fields, we would like
those to sort first. 
> 
> My first idea was to index terms in the allMetadatas using payloads. Each indexed term
would also have the specific metadataN field from which they originate. Then I can write a
scorer to score based on these payloads. 
> 
> However, if there is a way to do this without payloads I'm all ears!
> 



Mime
View raw message