uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: Lucene cas consumer
Date Fri, 05 Dec 2008 23:40:11 GMT
> The "problem", that's the UIMA's power,  is that everyone has it's own
> type system.
> To produce a lucene document one extract information from some
> features, applying the right analyzer. In my case I use maybe only 10%
> of the annotation produced by the analysis pipeline to produce a
> single lucene doc.
> So we need a very highly configurable component, able to map only
> certain declared features and applying the right analyzer and so on.
> Mny ways are possible:
> -completly programmatic: the indexer is abstract and should be
> extended to implement the right mapping for a specialized typeSytem
> and pipeline
> -configurable: mapping rules are defined in a descriptor file; the
> JENA component followed this way

I prefer mapping rules in the descriptor. These rules have to be
adjusted by many users to make them compatible with
their type system. Hard coding the mapping rules makes
this task more difficult.

As far as I know was this approach also chosen by the
regex annotator in the sandbox.

Jörn
Mime
View raw message