lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy DePue <>
Subject Using Lucene to find duplicate/similar names
Date Wed, 16 Apr 2008 16:37:34 GMT
I'm new to Lucene, and would like to use it to find duplicate (or 
similar) names in a contact list.  Is Lucene a good fit?
We have a form where a user enters a company or person's name, and we 
want the system to warn them if there is already a company or person 
entered with the same or similar name.
Based on the little I know of Lucene, I'm thinking an NGram algorithm 
(based on characters, not words) would work best... but, I'm not sure if 
Lucene takes proximity or edit distances into account?  For example, say 
you have these two names:
  Andrew John
  John Andrew

If a user enters Andy John, without proximity or edit distance, these 
two names will match about the same, while, obviously, the first name 
should be ranked higher.
Thanks in advance for any help or advice.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message