lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Konidena <deepak.konid...@cornell.edu>
Subject Re: Searching partial names using Lucene
Date Fri, 25 Mar 2011 14:16:19 GMT

On 3/25/11 5:57 AM, "Ian Lea" <ian.lea@gmail.com> wrote:

>Have you tried stemming?  Simpler and available in core lucene.  Look
>at PorterStemFilter or use your favourite search engine to find more
>info and options.

Ian,

I did try PorterStemFilter and couldn't get the result I wanted. (Dan ==
Daniel, Will == William, etc..)
>
>If instead you go the synonym route, there is sample code in Lucene in
>Action and a wordnet contrib module you might find useful.

Thanks, Will take a look at that.
>
>
>--
>Ian.
>
>On Thu, Mar 24, 2011 at 7:57 PM, Sujit Pal <sujit.pal@comcast.net> wrote:
>> I don't know if there is already an analyzer available for this, but you
>> could use GATE or UIMA for Named Entity Extraction against names and
>> expand the query to include the extra names that are used synonymously.
>> You could do this outside Lucene or inline using a custom Lucene
>> tokenizer that embeds either a GATE or UIMA NER.
>>
>> If you go the custom route (and you are not familiar with GATE or UIMA),
>> you may want to take a look at Dr Manu Konchady's book on Lingpipe,
>> Lucene and GATE - there is code in there to embed a GATE NER into a
>> Lucene tokenizer (although its not a streaming tokenizer due to the
>> nature of the NER process). The process would be similar for embedding a
>> UIMA NER.
>>
>> GATE (ANNIE) contains data files that list the common synonyms (eg. Bill
>> == William, Bob == Robert, Tom == Thomas, etc) which you can leverage
>> with GATE's Jape rule language. Alternatively, you could use the same
>> data from UIMA using a custom analysis engine (I prefer this route
>> because this is all Java, easier learning curve and maintainability).
>>
>> -sujit
>>
>> On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote:
>>> Hi,
>>>
>>> I  would like to build a search system where a search for "Dan" would
>>>also search for "Daniel" and a search for "Will", "William" . Any ideas
>>>on how to go about implementing that? I can think of writing a custom
>>>Analyzer that would map these partial tokens to their full firstname or
>>>lastnames. But is there an Analyzer in lucene contrib modules or
>>>elsewhere that does a similar job for me?
>>>
>>> Thanks,
>>> Deepak Konidena.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message