> The "problem", that's the UIMA's power, is that everyone has it's own > type system. > To produce a lucene document one extract information from some > features, applying the right analyzer. In my case I use maybe only 10% > of the annotation produced by the analysis pipeline to produce a > single lucene doc. > So we need a very highly configurable component, able to map only > certain declared features and applying the right analyzer and so on. > Mny ways are possible: > -completly programmatic: the indexer is abstract and should be > extended to implement the right mapping for a specialized typeSytem > and pipeline > -configurable: mapping rules are defined in a descriptor file; the > JENA component followed this way I prefer mapping rules in the descriptor. These rules have to be adjusted by many users to make them compatible with their type system. Hard coding the mapping rules makes this task more difficult. As far as I know was this approach also chosen by the regex annotator in the sandbox. Jörn