lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Lucene for name matching
Date Thu, 05 Apr 2007 20:17:08 GMT
It's like deja vu all over again.  I literally just finished up a  
similar task (about 2 hours ago).  I didn't use Lucene for it,  
although I suppose I could have.  Lucene does have the FuzzyQuery  
(http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ 
javadoc/org/apache/lucene/search/FuzzyQuery.html) that uses  
Levenshtein as a place to start.

There are other string matching algorithms as well that are used in  
various approaches.  See http://en.wikipedia.org/wiki/Edit_distance.   
Googling record linkage may help.  From there, you can pretty much  
knock yourself out with all the different approaches

On Apr 5, 2007, at 3:58 PM, moraleslos wrote:

>
> I was wondering if anyone has done people name matching using  
> Lucene.  For
> example, I have a name coming from some external source that I  
> would like to
> match with the one I have in my DB.  Lets say my DB contains the  
> name "John
> Smith".  If the external source has something like "Smith John",  
> "Smith,
> John", "J. Smith", etc., I would like to rate this matching based  
> on some %
> of closeness for review later.  I've searched around a bit for  
> algorithms
> and I kept seeing the Levenshtein distance algorithm which I'm sure  
> Lucene
> uses under the hood.  So I trying to guage if Lucene is useful for  
> doing
> something specific as this, or are there better algorithms and/or  
> software
> out there that does name matching.  Thanks in advance!
>
> -los
> -- 
> View this message in context: http://www.nabble.com/Lucene-for-name- 
> matching-tf3533454.html#a9862342
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message