lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Konidena <>
Subject Re: Searching partial names using Lucene
Date Fri, 25 Mar 2011 14:16:19 GMT

On 3/25/11 5:57 AM, "Ian Lea" <> wrote:

>Have you tried stemming?  Simpler and available in core lucene.  Look
>at PorterStemFilter or use your favourite search engine to find more
>info and options.


I did try PorterStemFilter and couldn't get the result I wanted. (Dan ==
Daniel, Will == William, etc..)
>If instead you go the synonym route, there is sample code in Lucene in
>Action and a wordnet contrib module you might find useful.

Thanks, Will take a look at that.
>On Thu, Mar 24, 2011 at 7:57 PM, Sujit Pal <> wrote:
>> I don't know if there is already an analyzer available for this, but you
>> could use GATE or UIMA for Named Entity Extraction against names and
>> expand the query to include the extra names that are used synonymously.
>> You could do this outside Lucene or inline using a custom Lucene
>> tokenizer that embeds either a GATE or UIMA NER.
>> If you go the custom route (and you are not familiar with GATE or UIMA),
>> you may want to take a look at Dr Manu Konchady's book on Lingpipe,
>> Lucene and GATE - there is code in there to embed a GATE NER into a
>> Lucene tokenizer (although its not a streaming tokenizer due to the
>> nature of the NER process). The process would be similar for embedding a
>> GATE (ANNIE) contains data files that list the common synonyms (eg. Bill
>> == William, Bob == Robert, Tom == Thomas, etc) which you can leverage
>> with GATE's Jape rule language. Alternatively, you could use the same
>> data from UIMA using a custom analysis engine (I prefer this route
>> because this is all Java, easier learning curve and maintainability).
>> -sujit
>> On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote:
>>> Hi,
>>> I  would like to build a search system where a search for "Dan" would
>>>also search for "Daniel" and a search for "Will", "William" . Any ideas
>>>on how to go about implementing that? I can think of writing a custom
>>>Analyzer that would map these partial tokens to their full firstname or
>>>lastnames. But is there an Analyzer in lucene contrib modules or
>>>elsewhere that does a similar job for me?
>>> Thanks,
>>> Deepak Konidena.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message