lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan, Brad (Bracknell)" <Brad.Al...@Fiserv.com>
Subject Minimize document hits based on number of matching terms between source text terms and document field terms
Date Thu, 09 May 2013 16:08:41 GMT
I'd like to get any comments about how I might do this - I have list some options below, which
of course I'll investigate...

Example first:
Name Field
--------------
Mr. Youness Rokven

Mr. Joe Paul Harry Arnold

Mr. Paul B. Mitchell

Mrs. Fernanda Joe Mitchell

Ms. Jade Paula Victoria Muir

Mr. Joe Harvey Pope


If I search the above with text such as "Joe P.H. Arnold" which is turned into a query:
((Joe) or (P) or (H) or (Arnold))

I get hits:
Mr. Joe Paul Harry Arnold

Mrs. Fernanda Joe Mitchell

Mr. Joe Harvey Pope


And the scores are great! The top hit having a higher relative score.

What I'd like to do is exclude hits where say less than 2 terms matched the document field
terms.

Options I think:

1.)    Overide DefaultSimilarity?

2.)    Construct awkward searches, example:

((Joe) and (P)) or ((Joe) and (H)) or ((Joe) and (Arnold))   etc ... all the possible combinations

3.)    Use TermVector information? Don't know much about this, but my thought is that if highlighting
knows the matching terms,...perhaps I use that?

Would be grateful for comments.
Thanks!



________________________________

CheckFree Solutions Limited (trading as Fiserv)
Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
Registered in England: No. 2694333

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message