lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Searching partial names using Lucene
Date Fri, 25 Mar 2011 09:57:58 GMT
Have you tried stemming?  Simpler and available in core lucene.  Look
at PorterStemFilter or use your favourite search engine to find more
info and options.

If instead you go the synonym route, there is sample code in Lucene in
Action and a wordnet contrib module you might find useful.


--
Ian.

On Thu, Mar 24, 2011 at 7:57 PM, Sujit Pal <sujit.pal@comcast.net> wrote:
> I don't know if there is already an analyzer available for this, but you
> could use GATE or UIMA for Named Entity Extraction against names and
> expand the query to include the extra names that are used synonymously.
> You could do this outside Lucene or inline using a custom Lucene
> tokenizer that embeds either a GATE or UIMA NER.
>
> If you go the custom route (and you are not familiar with GATE or UIMA),
> you may want to take a look at Dr Manu Konchady's book on Lingpipe,
> Lucene and GATE - there is code in there to embed a GATE NER into a
> Lucene tokenizer (although its not a streaming tokenizer due to the
> nature of the NER process). The process would be similar for embedding a
> UIMA NER.
>
> GATE (ANNIE) contains data files that list the common synonyms (eg. Bill
> == William, Bob == Robert, Tom == Thomas, etc) which you can leverage
> with GATE's Jape rule language. Alternatively, you could use the same
> data from UIMA using a custom analysis engine (I prefer this route
> because this is all Java, easier learning curve and maintainability).
>
> -sujit
>
> On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote:
>> Hi,
>>
>> I  would like to build a search system where a search for "Dan" would also search
for "Daniel" and a search for "Will", "William" . Any ideas on how to go about implementing
that? I can think of writing a custom Analyzer that would map these partial tokens to their
full firstname or lastnames. But is there an Analyzer in lucene contrib modules or elsewhere
that does a similar job for me?
>>
>> Thanks,
>> Deepak Konidena.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message