lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Svensson <si...@devhost.se>
Subject Re: Minimize document hits based on number of matching terms between source text terms and document field terms
Date Thu, 09 May 2013 16:19:02 GMT
Hi,

QueryParser.Parse will return a BooleanQuery when you've given it 
several terms. You can set MinimumNumberShouldMatch to get the behavior 
you want.

var query = queryParser.Parse(...)
var boolQuery = query as BooleanQuery;
if (boolQuery != null) {
     boolQuery.MinimumNumberShouldMatch = 2
}

// Simon

On 2013-05-09 18:08, Allan, Brad (Bracknell) wrote:
> I'd like to get any comments about how I might do this - I have list some options below,
which of course I'll investigate...
>
> Example first:
> Name Field
> --------------
> Mr. Youness Rokven
>
> Mr. Joe Paul Harry Arnold
>
> Mr. Paul B. Mitchell
>
> Mrs. Fernanda Joe Mitchell
>
> Ms. Jade Paula Victoria Muir
>
> Mr. Joe Harvey Pope
>
>
> If I search the above with text such as "Joe P.H. Arnold" which is turned into a query:
> ((Joe) or (P) or (H) or (Arnold))
>
> I get hits:
> Mr. Joe Paul Harry Arnold
>
> Mrs. Fernanda Joe Mitchell
>
> Mr. Joe Harvey Pope
>
>
> And the scores are great! The top hit having a higher relative score.
>
> What I'd like to do is exclude hits where say less than 2 terms matched the document
field terms.
>
> Options I think:
>
> 1.)    Overide DefaultSimilarity?
>
> 2.)    Construct awkward searches, example:
>
> ((Joe) and (P)) or ((Joe) and (H)) or ((Joe) and (Arnold))   etc ... all the possible
combinations
>
> 3.)    Use TermVector information? Don't know much about this, but my thought is that
if highlighting knows the matching terms,...perhaps I use that?
>
> Would be grateful for comments.
> Thanks!
>
>
>
> ________________________________
>
> CheckFree Solutions Limited (trading as Fiserv)
> Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
> Registered in England: No. 2694333
>


Mime
View raw message