lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Creating custom Filter / Tokenizer / Request Handler for integration of NER-Framework
Date Thu, 24 May 2012 14:33:30 GMT
As Ahmet says, The Update Chain is probably the place to integrate such document oriented processing.
See http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ for how it integrates with
Solr.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 24. mai 2012, at 14:04, Wunderlich, Tobias wrote:

> Hey Guys,
> 
> I am recently working on a project to integrate a Named-Entity-Recognition-Framework
(NER) in an existing searchplatform based on Solr. The Platform uses ManifoldCF to automatically
gather the content from various repositories. The NER-Framework creates Annotations/Metadata
from given content which I then want to integrate into the search-platform as metadata to
use for faceting. Since MCF handles all content gathering, I need a way to integrate the NER-Framework
directly into Solr. The Goal is to get all Annotations per document into a multivalued field.
 My first thought was to create a custom filter, which just takes the content and gives back
only the Annotations.  But as I understand it, a filter only processes predetermined Tokens,
which is useless for my purpose, since the NER-Framework needs to process the whole content
of a document. What about a custom Tokenizer? Would it be possible to process the whole text
and give back only the Annotations as Tokens? A third thought was to manipulate the ExtractRequestHandler
(Solr Cell) used by MCF to somehow add the Annotations as Metadata when the content and metadata
is distributed to the different fields.
> 
> I hope my problem description is sufficient. Does anybody have any thoughts on that subject?
> 
> Best regards,
> Tobias


Mime
View raw message