lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Konidena <>
Subject Re: Searching partial names using Lucene
Date Fri, 25 Mar 2011 14:14:17 GMT
Thanks for the detailed response sujit. UIMA, especially looks like an
interesting option.

On 3/24/11 3:57 PM, "Sujit Pal" <> wrote:

>I don't know if there is already an analyzer available for this, but you
>could use GATE or UIMA for Named Entity Extraction against names and
>expand the query to include the extra names that are used synonymously.
>You could do this outside Lucene or inline using a custom Lucene
>tokenizer that embeds either a GATE or UIMA NER.
>If you go the custom route (and you are not familiar with GATE or UIMA),
>you may want to take a look at Dr Manu Konchady's book on Lingpipe,
>Lucene and GATE - there is code in there to embed a GATE NER into a
>Lucene tokenizer (although its not a streaming tokenizer due to the
>nature of the NER process). The process would be similar for embedding a
>GATE (ANNIE) contains data files that list the common synonyms (eg. Bill
>== William, Bob == Robert, Tom == Thomas, etc) which you can leverage
>with GATE's Jape rule language. Alternatively, you could use the same
>data from UIMA using a custom analysis engine (I prefer this route
>because this is all Java, easier learning curve and maintainability).
>On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote:
>> Hi,
>> I  would like to build a search system where a search for "Dan" would
>>also search for "Daniel" and a search for "Will", "William" . Any ideas
>>on how to go about implementing that? I can think of writing a custom
>>Analyzer that would map these partial tokens to their full firstname or
>>lastnames. But is there an Analyzer in lucene contrib modules or
>>elsewhere that does a similar job for me?
>> Thanks,
>> Deepak Konidena.
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message