lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wunderlich, Tobias" <tobias.wunderl...@igd-r.fraunhofer.de>
Subject Creating custom Filter / Tokenizer / Request Handler for integration of NER-Framework
Date Thu, 24 May 2012 12:36:59 GMT
Hey Guys,

I am recently working on a project to integrate a Named-Entity-Recognition-Framework (NER)
in an existing searchplatform based on Solr. The Platform uses ManifoldCF to automatically
gather the content from various repositories. The NER-Framework creates Annotations/Metadata
from given content which I then want to integrate into the search-platform as metadata to
use for faceting. Since MCF handles all content gathering, I need a way to integrate the NER-Framework
directly into Solr. The Goal is to get all Annotations per document into a multivalued field.
 My first thought was to create a custom filter, which just takes the content and gives back
only the Annotations.  But as I understand it, a filter only processes predetermined Tokens,
which is useless for my purpose, since the NER-Framework needs to process the whole content
of a document. What about a custom Tokenizer? Would it be possible to process the whole text
and give back only the Annotations as Tokens? A third thought was to manipulate the ExtractRequestHandler
(Solr Cell) used by MCF to somehow add the Annotations as Metadata when the content and metadata
is distributed to the different fields.

I hope my problem description is sufficient. Does anybody have any thoughts on that subject?

Best regards,
Tobias


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message