Another problem (just discovered this): TokenizerFactories do not get
resource handlers. So, you can't go read config or model files for
your Tokenizer. TokenFilters do, so you can use the KeywordTokenizer
(make one big term) and do your work in a TokenFilter that gets the
whole thing.
On Thu, May 24, 2012 at 7:33 AM, Jan Høydahl <jan.asf@cominvent.com> wrote:
> As Ahmet says, The Update Chain is probably the place to integrate such document oriented
processing.
> See http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ for how it integrates
with Solr.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.facebook.com/Cominvent
> Solr Training - www.solrtraining.com
>
> On 24. mai 2012, at 14:04, Wunderlich, Tobias wrote:
>
>> Hey Guys,
>>
>> I am recently working on a project to integrate a Named-Entity-Recognition-Framework
(NER) in an existing searchplatform based on Solr. The Platform uses ManifoldCF to automatically
gather the content from various repositories. The NER-Framework creates Annotations/Metadata
from given content which I then want to integrate into the search-platform as metadata to
use for faceting. Since MCF handles all content gathering, I need a way to integrate the NER-Framework
directly into Solr. The Goal is to get all Annotations per document into a multivalued field.
My first thought was to create a custom filter, which just takes the content and gives back
only the Annotations. But as I understand it, a filter only processes predetermined Tokens,
which is useless for my purpose, since the NER-Framework needs to process the whole content
of a document. What about a custom Tokenizer? Would it be possible to process the whole text
and give back only the Annotations as Tokens? A third thought was to manipulate the ExtractRequestHandler
(Solr Cell) used by MCF to somehow add the Annotations as Metadata when the content and metadata
is distributed to the different fields.
>>
>> I hope my problem description is sufficient. Does anybody have any thoughts on that
subject?
>>
>> Best regards,
>> Tobias
>
--
Lance Norskog
goksron@gmail.com
|