lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan, Brad (Bracknell)" <>
Subject Minimize document hits based on number of matching terms between source text terms and document field terms
Date Thu, 09 May 2013 16:08:41 GMT
I'd like to get any comments about how I might do this - I have list some options below, which
of course I'll investigate...

Example first:
Name Field
Mr. Youness Rokven

Mr. Joe Paul Harry Arnold

Mr. Paul B. Mitchell

Mrs. Fernanda Joe Mitchell

Ms. Jade Paula Victoria Muir

Mr. Joe Harvey Pope

If I search the above with text such as "Joe P.H. Arnold" which is turned into a query:
((Joe) or (P) or (H) or (Arnold))

I get hits:
Mr. Joe Paul Harry Arnold

Mrs. Fernanda Joe Mitchell

Mr. Joe Harvey Pope

And the scores are great! The top hit having a higher relative score.

What I'd like to do is exclude hits where say less than 2 terms matched the document field

Options I think:

1.)    Overide DefaultSimilarity?

2.)    Construct awkward searches, example:

((Joe) and (P)) or ((Joe) and (H)) or ((Joe) and (Arnold))   etc ... all the possible combinations

3.)    Use TermVector information? Don't know much about this, but my thought is that if highlighting
knows the matching terms,...perhaps I use that?

Would be grateful for comments.


CheckFree Solutions Limited (trading as Fiserv)
Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
Registered in England: No. 2694333

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message