lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Subjects DB Matching
Date Tue, 07 Oct 2008 15:43:53 GMT
Hi Mauro,

I'd go to one of the Lucene mail archives, and search "record  
linkage", there you will find various conversations on the topic [1].   
Also, try googling for that.  In particular, you might look for stuff  
by W. Winkler at the census bureau, amongst others.   There is also  
the Second String package by William Cohen at CMU that may help, but I  
don't know if it scales or how well supported it is.

Also see http://en.wikipedia.org/wiki/Jaro-Winkler as a starting  
point.  In short, I think Lucene could facilitate such a system, but  
it probably isn't going to be the main piece.

-Grant

[1] http://lucene.markmail.org/message/nyz7hrmzgzkwporq?q=record+linkage

On Sep 29, 2008, at 9:12 AM, mauro fraboni wrote:

> I am studying the possibility to use Lucene in order to build a
> matching system for a database of subjects.
> The subjects are stored in records of database with different fields
> like name, surname, address and I would like to build a proximity
> matcher that found an input subject in DB.
> The idea is to map the concept of document with the record , fields of
> record will be the fields of document.
>
> The problem is that my matching system should be quite accurate and
> should be able to return only one subject matched (the most near to
> the input) and no subject matched in other cases. I am not able to
> find a valid rule for the No-matching. Is it possible to find a rule
> based on Score that tells that the subject in input is not near enough
> to the subject in DB , so it should not be matched? Is it possible to
> find a minimum score for this purpose?
> Any suggestion will be appreciated.
>
> ciao mauro



Mime
View raw message