uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew MacKinlay ...@akmy.net>
Subject Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader
Date Tue, 01 Apr 2014 00:19:11 GMT
Fantastic, thanks very much for the tips. I had just stumbled across createReaderDescription
(after my initial post, of course), and it's reassuring to know that it should just work (I
don't think I'll need to manually merge, but it's useful to know how if needed). It's also
useful to know about the practical differences between CAS and JCas, which I'd never really
worked out before.


On 31/03/2014, at 5:59 PM, Richard Eckart de Castilho wrote:

> You can pass in your TSD to the reader.
> createReaderDescription(YourReader.class, tsd, PARAM_1, value_1, PARAM_2, value_2, ...)
> It is sufficient to add your types to the reader. They will automatically apply to other
> components if you run them in the same pipeline as the reader. In fact, the CAS will
> initialized from the merged TSDs in all components within a pipeline.
> If you have other non-dynamic types you can merge them with your dynamically created
> using something like this
> tsd = CasCreationUtils.mergeTypeSystems(
>  asList(tsd, TypeSystemDescriptionFactory.createTypeSystemDescription()));
> If you work with dynamically created types, you can largely forget about using JCas and
> go with the CAS interface. If one starts thinking about using reflection on UIMA types,
> time has come to switch from JCas to CAS. Of course you can mix both approaches and still
> use JCas for the non-dynamic types in your annotator/reader.
> Cheers,
> -- Richard
> On 31.03.2014, at 07:14, Andrew MacKinlay <am@akmy.net> wrote:
>> Ah, thanks - that's probably nicer than my current implementation where every type
has to be handled in two places, but I think it's not exactly going to work for me for a couple
of reasons, which I didn't articulate in my initial post. Firstly, to complicate things a
little, that annotation type string, which that current implementation expects to be a single
word, is actually now a URI. My type system description creation code converts this to a fully-qualified
dotted Java/UIMA type name. 
>> In principle, I guess I could do something similar for a fully-qualified type name,
but in practice guaranteeing uniqueness for a type name converted from a URL is pretty much
impossible if you want human-readability ("http://foo-bar.example.org/qw#first-name" and "http://foo-bar.example.org/qw/first-name"
map to the same thing currently, so I add an arbitrary suffix if there are collisions), which
means that the conversion is lossy, even if practically this would almost certainly not occur.
>> Secondly, I guess my current hard-coded solution for managing the types implies that
the set of types is stable enough that it would be feasible to implement most of them manually,
with the unknown item fallback. However, this was in fact a quick-and-dirty solution for a
demo, and I'm no longer convinced that manual static implementations of *any* leaf annotation
types is the Right Thing To Do, due to various considerations such as the fact that these
types are stored dynamically within the web service and are really properties of a particular
data set which is being exposed, rather than part of the defined API of the web service.
>> Thanks again,
>> Andy
>> On 31/03/2014, at 3:50 PM, Hugo Mougard wrote:
>>> Hello,
>>> I won't address the type system description part, but about the collection reader,
you could make use of reflection to ease the maintenance overhead (for example with the guava
library. The idea would be to autodetect if types are present in a given package and use them
accordingly. The following snippet will put in a map the classes that you can use based on
a given package and the fact that they implement Annotation: https://gist.github.com/m09/9885425
>>> You could then use it like so, in the getItemAnnotationForType method:
>>> String annName = annType.replace("-", "").toLowerCase(Locale.English);
>>> if (annotations.containsKey(annName)) {
>>>  return annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jcas);
>>> } else {
>>>  new UnknownItemAnnotation(jcas);
>>> }
>>> Best,
>>> Hugo
>>> On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi,
>>>> I have a working CollectionReader implementation which converts from a particular
web service to UIMA annotations, based primarily on uimaFIT. It works OK, but the problem
is that the web service has its own implicit dynamic type system, particularly for document
annotations, and that is currently not being well-handled (I can put a 'type' string as a
textual feature, but UIMA is not set up to query over these kinds of annotations, as far as
I can tell, so it seems suboptimal).
>>>> I have now written code which can generate a TypeSystemDescription at runtime
for the dynamic types based on the web service output. However, I'm not sure how to most sensibly
integrate that with my uimaFIT architecture. Does anyone have any ideas? I guess I could stop
using uimaFIT altogether - maybe it's not the right solution here, (although I'm also not
entirely sure of the best way to handle this in classic UIMA).
>>>> I'd like to keep using uimaFIT if possible though - many other types, particularly
those relating to overall document metadata, are already working very nicely and succinctly
with uimaFIT.
>>>> BTW, the current CollectionReader implementation, which hard-codes handling
of some types, and uses the textual string fallback in other cases, can be found at https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd165ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.java?at=default
>>>> Thanks,
>>>> Andy

View raw message