lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Darren Hartford" <>
Subject Geneology, nicknames, levenstein, soundex/metaphone, etc
Date Fri, 29 Jun 2007 15:41:20 GMT
Hey all,
As you can tell by the subject, interested in 'name searching' and
'nearby name' searching.  Scenarios include Geneology and
Similar-Person-from-Different-Datasources matchings.  Assuming
java-based lucene, and more than likely the Solr project.

*nickname:  would it be feasible to create an Analyzer that will tie to
an external/internal nickname datasource (datasource would vary
dramatically based on nationality).  Usecase:  Jon, John, Johnny,
Jonathan would have 'weight' in the relevance.  Similarly 'Dick',
'Chuck', and 'Charles'.

*levenstein distance:  This is why I'm looking at lucene and the related
Solr project - levenstein already exists, but seems to be a separate
query/relevance metric.  Is there a way to add to the overall weight
across multiple Analyzers?  Sorry, very ignorant of the capabilities and
am curious.

*soundex/metaphone: Again, more about adding to an overall weight versus
separate, distinct query.  This may become not useful in the overall
picture with the use of Levenstein Distance and Nickname analyzer, but
having the capability there just in case.

*cross-field matches: I am planning on indexing specific field names,
with two being 'maiden name' and 'last name'.  When searching for a
'last name' value, how to manage weight for values that might (and very
well will) be in two different fields.

Sorry for the short/abrupt post but if I spell it all out it would be
huge.  Just asking some key points ;-)


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message