uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew MacKinlay ...@akmy.net>
Subject Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader
Date Mon, 31 Mar 2014 05:14:13 GMT
Ah, thanks - that's probably nicer than my current implementation where every type has to be
handled in two places, but I think it's not exactly going to work for me for a couple of reasons,
which I didn't articulate in my initial post. Firstly, to complicate things a little, that
annotation type string, which that current implementation expects to be a single word, is
actually now a URI. My type system description creation code converts this to a fully-qualified
dotted Java/UIMA type name. 

In principle, I guess I could do something similar for a fully-qualified type name, but in
practice guaranteeing uniqueness for a type name converted from a URL is pretty much impossible
if you want human-readability ("http://foo-bar.example.org/qw#first-name" and "http://foo-bar.example.org/qw/first-name"
map to the same thing currently, so I add an arbitrary suffix if there are collisions), which
means that the conversion is lossy, even if practically this would almost certainly not occur.

Secondly, I guess my current hard-coded solution for managing the types implies that the set
of types is stable enough that it would be feasible to implement most of them manually, with
the unknown item fallback. However, this was in fact a quick-and-dirty solution for a demo,
and I'm no longer convinced that manual static implementations of *any* leaf annotation types
is the Right Thing To Do, due to various considerations such as the fact that these types
are stored dynamically within the web service and are really properties of a particular data
set which is being exposed, rather than part of the defined API of the web service.

Thanks again,

On 31/03/2014, at 3:50 PM, Hugo Mougard wrote:

> Hello,
> I won't address the type system description part, but about the collection reader, you
could make use of reflection to ease the maintenance overhead (for example with the guava
library. The idea would be to autodetect if types are present in a given package and use them
accordingly. The following snippet will put in a map the classes that you can use based on
a given package and the fact that they implement Annotation: https://gist.github.com/m09/9885425
> You could then use it like so, in the getItemAnnotationForType method:
> String annName = annType.replace("-", "").toLowerCase(Locale.English);
> if (annotations.containsKey(annName)) {
>    return annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jcas);
> } else {
>    new UnknownItemAnnotation(jcas);
> }
> Best,
> Hugo
> On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi,
>> I have a working CollectionReader implementation which converts from a particular
web service to UIMA annotations, based primarily on uimaFIT. It works OK, but the problem
is that the web service has its own implicit dynamic type system, particularly for document
annotations, and that is currently not being well-handled (I can put a 'type' string as a
textual feature, but UIMA is not set up to query over these kinds of annotations, as far as
I can tell, so it seems suboptimal).
>> I have now written code which can generate a TypeSystemDescription at runtime for
the dynamic types based on the web service output. However, I'm not sure how to most sensibly
integrate that with my uimaFIT architecture. Does anyone have any ideas? I guess I could stop
using uimaFIT altogether - maybe it's not the right solution here, (although I'm also not
entirely sure of the best way to handle this in classic UIMA).
>> I'd like to keep using uimaFIT if possible though - many other types, particularly
those relating to overall document metadata, are already working very nicely and succinctly
with uimaFIT.
>> BTW, the current CollectionReader implementation, which hard-codes handling of some
types, and uses the textual string fallback in other cases, can be found at https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd165ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.java?at=default
>> Thanks,
>> Andy

View raw message