lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N. Hira" <>
Subject Re: How to implement cut of score ?
Date Mon, 13 Aug 2007 19:43:45 GMT

If I understand the problem correctly, it is:  given a [job
description], find [candidates] that we would not otherwise find.  That
seems to be a "user-weighted similarity" problem more than a simple
search problem.

1.  Given a [job description], create a set of queries that look for
"the most important terms" that match these specific terms, or synonyms.
2.  Use these queries to create a UNION SET of Documents.
3.  Show "the most relevant" documents.

The way I see it, Lucene only helps with 2. above.  For 3., using score
would result in "documents that match", but not "better candidates".
For example, when I'm trying to fill a Junior Java programmer position
and looking for someone with 2-years of Java experience, whether someone
with 15-years of Java experience is a "good fit" is subjective.

If I was had to get this done quickly, one approach is to simply cluster
the documents found at 2. using Carrot2.  This would allow the user to
explore the "fuzzy" results.  Better than nothing, but not really the
right solution.

Another approach would be to parse the resumes to tabulate attributes
1.  overall experience
2.  skills mentioned
3.  experience with a particular skill
4.  ...

Then, you could use this information to evaluate similarity with respect
to the hard criteria and, for example, inform that ranking with tf-idfs
for the skills that are being sought.

Good luck!

Hira, N.R.
Solutions Architect
Cognocys, Inc.

On Mon, 2007-08-13 at 14:40 -0400, Donna L Gresh wrote:
> Hoss wrote:
> this would be meaningless even if it were easier...
> FAQ: "Can I filter by score?"
> -Hoss
> I've read the warnings referenced there; but still have a problem to 
> solve. We have "fact-based" information about
> people and the jobs they might fill (availability dates, experience level, 
> languages spoken, etc.) and we have textual
> information about both the jobs and the people (e.g. resumes). We'd like 
> to use the "goodness of match" of the
> textual description of the job to the person's resume as a way to suggest, 
> for example, additional people who
> should be considered for the job, even if, say, their specific job title 
> does not match the requested job title.
> I can use the job description to construct a query (and I've done it in a 
> variety of ways), but how best to choose which
> of the returned people to allow to "fit" the job? An obviously desirable 
> way to do it is using the score, but all discussion
> seems to say "don't do that, since absolute score isn't meaningful" (and I 
> don't use the normalized score, BTW, I use the
> raw score, but the same caveats apply). Certainly the relative scores for 
> a single query can be used to rank the goodness
> of fit to that particular job, but that doesn't solve the problem in 
> general. 
> Should I just give up and return the "25 best" fits to the job, and only 
> use score to rank them relative to one another? This
> then means that a job description that has very few words that match 
> *anything* in the collection of resumes will still
> produce 25 people that "match". In practice (anecdotally) it does appear 
> to me that when the highest score for a 
> particular job description is fairly small (say 0.10) that there's a good 
> reason for that, and when the highest score is 
> something like 0.60, there's a good reason for that as well. That is, 
> queries that yield a small "best score" are queries
> for which I would not expect good matches, and vice versa. So it does seem 
> (again anecdotally) that the score has
> *some* relevance. What are the experts' thoughts on this?
> Donna L. Gresh
> Services Research, Mathematical Sciences Department
> IBM T.J. Watson Research Center
> (914) 945-2472

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message