lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From moraleslos <>
Subject Re: Lucene for name matching
Date Thu, 05 Apr 2007 20:33:34 GMT

Hi Grant!

Thanks for the reply.  I'll look into the links you suggested.  Just curious
though, what did you do to implement this--if you can spill some of the
beans  ;-)  You think what you did was better than the FuzzyQuery approach? 
Was it a custom algorithm or did you utilize some framework for this?  I
basically don't want to reinvent the wheel when doing this name matching
issue.  Thanks in advance!


Grant Ingersoll-6 wrote:
> It's like deja vu all over again.  I literally just finished up a  
> similar task (about 2 hours ago).  I didn't use Lucene for it,  
> although I suppose I could have.  Lucene does have the FuzzyQuery  
> ( 
> javadoc/org/apache/lucene/search/FuzzyQuery.html) that uses  
> Levenshtein as a place to start.
> There are other string matching algorithms as well that are used in  
> various approaches.  See   
> Googling record linkage may help.  From there, you can pretty much  
> knock yourself out with all the different approaches
> On Apr 5, 2007, at 3:58 PM, moraleslos wrote:
>> I was wondering if anyone has done people name matching using  
>> Lucene.  For
>> example, I have a name coming from some external source that I  
>> would like to
>> match with the one I have in my DB.  Lets say my DB contains the  
>> name "John
>> Smith".  If the external source has something like "Smith John",  
>> "Smith,
>> John", "J. Smith", etc., I would like to rate this matching based  
>> on some %
>> of closeness for review later.  I've searched around a bit for  
>> algorithms
>> and I kept seeing the Levenshtein distance algorithm which I'm sure  
>> Lucene
>> uses under the hood.  So I trying to guage if Lucene is useful for  
>> doing
>> something specific as this, or are there better algorithms and/or  
>> software
>> out there that does name matching.  Thanks in advance!
>> -los
>> -- 
>> View this message in context: 
>> matching-tf3533454.html#a9862342
>> Sent from the Lucene - Java Users mailing list archive at
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> Read the Lucene Java FAQ at 
> LuceneFAQ
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message