Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9ECE011F09 for ; Mon, 31 Mar 2014 07:00:04 +0000 (UTC) Received: (qmail 42181 invoked by uid 500); 31 Mar 2014 07:00:03 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 41818 invoked by uid 500); 31 Mar 2014 07:00:02 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 41666 invoked by uid 99); 31 Mar 2014 06:59:58 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 06:59:58 +0000 Received: from localhost (HELO [10.0.1.6]) (127.0.0.1) (smtp-auth username rec, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 06:59:57 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: Preferably using UIMAfit, how can I dynamically generate types for a CollectionReader From: Richard Eckart de Castilho In-Reply-To: <3F248546-E2C3-45DE-9CBD-D6D4A6374E04@akmy.net> Date: Mon, 31 Mar 2014 08:59:52 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <5E5D55E4-9D7E-47C9-BEF6-E9648AB41A2E@apache.org> References: <25ee5ecbfba9f295b3c6a9a9ee873148@crydee.eu> <3F248546-E2C3-45DE-9CBD-D6D4A6374E04@akmy.net> To: user@uima.apache.org X-Mailer: Apple Mail (2.1874) You can pass in your TSD to the reader. createReaderDescription(YourReader.class, tsd, PARAM_1, value_1, = PARAM_2, value_2, ...) It is sufficient to add your types to the reader. They will = automatically apply to other components if you run them in the same pipeline as the reader. In fact, = the CAS will be initialized from the merged TSDs in all components within a pipeline. If you have other non-dynamic types you can merge them with your = dynamically created TSD using something like this tsd =3D CasCreationUtils.mergeTypeSystems( asList(tsd, = TypeSystemDescriptionFactory.createTypeSystemDescription())); If you work with dynamically created types, you can largely forget about = using JCas and just go with the CAS interface. If one starts thinking about using reflection = on UIMA types, the time has come to switch from JCas to CAS. Of course you can mix both = approaches and still use JCas for the non-dynamic types in your annotator/reader. Cheers, -- Richard On 31.03.2014, at 07:14, Andrew MacKinlay wrote: > Ah, thanks - that's probably nicer than my current implementation = where every type has to be handled in two places, but I think it's not = exactly going to work for me for a couple of reasons, which I didn't = articulate in my initial post. Firstly, to complicate things a little, = that annotation type string, which that current implementation expects = to be a single word, is actually now a URI. My type system description = creation code converts this to a fully-qualified dotted Java/UIMA type = name.=20 >=20 > In principle, I guess I could do something similar for a = fully-qualified type name, but in practice guaranteeing uniqueness for a = type name converted from a URL is pretty much impossible if you want = human-readability ("http://foo-bar.example.org/qw#first-name" and = "http://foo-bar.example.org/qw/first-name" map to the same thing = currently, so I add an arbitrary suffix if there are collisions), which = means that the conversion is lossy, even if practically this would = almost certainly not occur. >=20 > Secondly, I guess my current hard-coded solution for managing the = types implies that the set of types is stable enough that it would be = feasible to implement most of them manually, with the unknown item = fallback. However, this was in fact a quick-and-dirty solution for a = demo, and I'm no longer convinced that manual static implementations of = *any* leaf annotation types is the Right Thing To Do, due to various = considerations such as the fact that these types are stored dynamically = within the web service and are really properties of a particular data = set which is being exposed, rather than part of the defined API of the = web service. >=20 >=20 > Thanks again, > Andy >=20 >=20 > On 31/03/2014, at 3:50 PM, Hugo Mougard wrote: >=20 >> Hello, >>=20 >> I won't address the type system description part, but about the = collection reader, you could make use of reflection to ease the = maintenance overhead (for example with the guava library. The idea would = be to autodetect if types are present in a given package and use them = accordingly. The following snippet will put in a map the classes that = you can use based on a given package and the fact that they implement = Annotation: https://gist.github.com/m09/9885425 >>=20 >> You could then use it like so, in the getItemAnnotationForType = method: >>=20 >> String annName =3D annType.replace("-", = "").toLowerCase(Locale.English); >> if (annotations.containsKey(annName)) { >> return = annotations.get(annName).getDeclaredConstructor(JCas.class).newInstance(jc= as); >> } else { >> new UnknownItemAnnotation(jcas); >> } >>=20 >> Best, >> Hugo >>=20 >> On 03/31/2014 11:56 AM, Andrew MacKinlay wrote:> Hi, >>>=20 >>> I have a working CollectionReader implementation which converts from = a particular web service to UIMA annotations, based primarily on = uimaFIT. It works OK, but the problem is that the web service has its = own implicit dynamic type system, particularly for document annotations, = and that is currently not being well-handled (I can put a 'type' string = as a textual feature, but UIMA is not set up to query over these kinds = of annotations, as far as I can tell, so it seems suboptimal). >>>=20 >>> I have now written code which can generate a TypeSystemDescription = at runtime for the dynamic types based on the web service output. = However, I'm not sure how to most sensibly integrate that with my = uimaFIT architecture. Does anyone have any ideas? I guess I could stop = using uimaFIT altogether - maybe it's not the right solution here, = (although I'm also not entirely sure of the best way to handle this in = classic UIMA). >>>=20 >>> I'd like to keep using uimaFIT if possible though - many other = types, particularly those relating to overall document metadata, are = already working very nicely and succinctly with uimaFIT. >>>=20 >>>=20 >>> BTW, the current CollectionReader implementation, which hard-codes = handling of some types, and uses the textual string fallback in other = cases, can be found at = https://bitbucket.org/andymackinlay/uimavlab/src/c178fa9ebf5d5ffcad0249dd1= 65ca44cde8dcefd/src/main/java/com/nicta/uimavlab/ItemListCollectionReader.= java?at=3Ddefault >>>=20 >>>=20 >>> Thanks, >>> Andy >>>=20 >>=20 >>=20 >=20