lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-user] Dictionary based NER with Lucy
Date Fri, 12 Oct 2012 14:10:51 GMT
On 12/10/2012 15:27, Aleksandar Radovanovic wrote:
> Thank you Nick. Could you possibly give me some more specific guidelines?
>
> At the moment, all indexed words are "flat" with no semantics - which is
> great for general purposes. However, if one focuses on, let's say
> biomedical literature, one would like to distinguish what words
> represent gene names, drugs names etc.. User would be able to compose
> search like "[drug_dictionary_ID] AND headache" to get documents
> containing all drug names related to headache.

First, create a schema with two full-text fields. One named "text" for 
the document content, and another one named "dict" for dictionary IDs. 
Then, before indexing a document, create a list of dictionary IDs 
related to that document. Store the IDs in the "dict" field separated by 
whitespace and index the document.

For the search part, you can write your own query parser, or use the 
excellent Search::Query module which supports the "field:value" syntax. 
Something like that should work:

my $parser = Search::Query->parser(
     dialect => 'Lucy',
     default_field => 'text',
);
my $query = $parser->parse('dict:drug_dictionary_ID AND headache');
my $lucy_query = $query->as_lucy_query();
my $hits = $lucy_searcher->hits( query => $lucy_query );

Hope this helps,

Nick

Mime
View raw message