lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Specialized Analyzer for names
Date Fri, 23 Nov 2012 14:36:54 GMT
Hi,
I'm indexing names in a dedicated Lucene field and I wonder which
analyzer to use for that purpose. Typically, the names are in the format
"John Smith", so the WhitespaceAnalyzer is likely the best in most
cases. The field type to choose seems to be the TextField.
Or, would you rather recommend using the KeywordAnalyzer? I'm a bit
cautious about that because I'm afraid of wildcard or regex queries such
as "*Smith" or ".*Smith" respectively.

However, there might also be special cases and spelling exceptions of
all kinds, e.g. "Smith, John", "John 'Hammmer' Smith", "Abd al-Aziz",
"Stan van Hoop" and what else one could imagine. Is there a special
Analyzer that is optimized on dealing with such cases or do I have to do
normalization beforehand?
I see that such special characters and spellings can easily be covered
by the right queries, but that requires the user to know the exact
spelling, which is what I'm trying to spare her.

Best regards,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message