uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader
Date Mon, 31 Mar 2014 06:59:52 GMT
You can pass in your TSD to the reader.

createReaderDescription(YourReader.class, tsd, PARAM_1, value_1, PARAM_2, value_2, ...)

It is sufficient to add your types to the reader. They will automatically apply to other
components if you run them in the same pipeline as the reader. In fact, the CAS will be
initialized from the merged TSDs in all components within a pipeline.

If you have other non-dynamic types you can merge them with your dynamically created TSD
using something like this

tsd = CasCreationUtils.mergeTypeSystems(
  asList(tsd, TypeSystemDescriptionFactory.createTypeSystemDescription()));

If you work with dynamically created types, you can largely forget about using JCas and just
go with the CAS interface. If one starts thinking about using reflection on UIMA types, the
time has come to switch from JCas to CAS. Of course you can mix both approaches and still
use JCas for the non-dynamic types in your annotator/reader.


-- Richard

On 31.03.2014, at 07:14, Andrew MacKinlay <am@akmy.net> wrote:

> Ah, thanks - that's probably nicer than my current implementation where every type has
to be handled in two places, but I think it's not exactly going to work for me for a couple
of reasons, which I didn't articulate in my initial post. Firstly, to complicate things a
little, that annotation type string, which that current implementation expects to be a single
word, is actually now a URI. My type system description creation code converts this to a fully-qualified
dotted Java/UIMA type name. 
> In principle, I guess I could do something similar for a fully-qualified type name, but
in practice guaranteeing uniqueness for a type name converted from a URL is pretty much impossible
if you want human-readability ("http://foo-bar.example.org/qw#first-name" and "http://foo-bar.example.org/qw/first-name"
map to the same thing currently, so I add an arbitrary suffix if there are collisions), which
means that the conversion is lossy, even if practically this would almost certainly not occur.
> Secondly, I guess my current hard-coded solution for managing the types implies that
the set of types is stable enough that it would be feasible to implement most of them manually,
with the unknown item fallback. However, this was in fact a quick-and-dirty solution for a
demo, and I'm no longer convinced that manual static implementations of *any* leaf annotation
types is the Right Thing To Do, due to various considerations such as the fact that these
types are stored dynamically within the web service and are really properties of a particular
data set which is being exposed, rather than part of the defined API of the web service.
> Thanks again,
> Andy
> On 31/03/2014, at 3:50 PM, Hugo Mougard wrote:
>> Hello,
>> I won't address the type system description part, but about the collection reader,
you could make use of reflection to ease the maintenance overhead (for example with the guava
library. The idea would be to autodetect if types are present in a given package and use them
accordingly. The following snippet will put in a map the classes that you can use based on
a given package and the fact that they implement Annotation: https://gist.github.com/m09/9885425
>> You could then use it like so, in the getItemAnnotationForType method:
>> String annName = annType.replace("-", "").toLowerCase(Locale.English);
>> if (annotations.containsKey(annName)) {
>>   return annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jcas);
>> } else {
>>   new UnknownItemAnnotation(jcas);
>> }
>> Best,
>> Hugo
>> On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi,
>>> I have a working CollectionReader implementation which converts from a particular
web service to UIMA annotations, based primarily on uimaFIT. It works OK, but the problem
is that the web service has its own implicit dynamic type system, particularly for document
annotations, and that is currently not being well-handled (I can put a 'type' string as a
textual feature, but UIMA is not set up to query over these kinds of annotations, as far as
I can tell, so it seems suboptimal).
>>> I have now written code which can generate a TypeSystemDescription at runtime
for the dynamic types based on the web service output. However, I'm not sure how to most sensibly
integrate that with my uimaFIT architecture. Does anyone have any ideas? I guess I could stop
using uimaFIT altogether - maybe it's not the right solution here, (although I'm also not
entirely sure of the best way to handle this in classic UIMA).
>>> I'd like to keep using uimaFIT if possible though - many other types, particularly
those relating to overall document metadata, are already working very nicely and succinctly
with uimaFIT.
>>> BTW, the current CollectionReader implementation, which hard-codes handling of
some types, and uses the textual string fallback in other cases, can be found at https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd165ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.java?at=default
>>> Thanks,
>>> Andy

View raw message