uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petr Baudis <pa...@ucw.cz>
Subject Re: UIMAj3 ideas
Date Thu, 16 Jul 2015 16:52:41 GMT
On Fri, Jul 10, 2015 at 01:37:27PM -0400, Marshall Schor wrote:
> On 7/9/2015 6:52 PM, Petr Baudis wrote:
> <snip...>
> https://cwiki.apache.org/confluence/display/UIMA/Ideas+for+UIMAJ+v3
> >   I didn't figure out how to edit that wiki page, 
> Due to spammers, we had to turn off public editing.  However, I can add you to a
> list ( to do this, you have to "register" for a user id on the wiki, and then
> send me offline what that Id is ), but even without being on the list, there's a
> comment button which (I think) lets you add comments at the bottom.
> > but a mental summary
> > of the things I find currently irritating about UIMA and would love to
> > see changed formed in my mind, so I thought I could contribute it for
> > discussion.
> Great!
> >
> >   * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
> >     UIMA.  It seems to me that UIMA-AS is doing things a bit differently
> >     than what the original UIMA idea of doing scaleout was.  The two
> >     things don't play well together.  I'd love a way to easily take
> >     my plain UIMA pipeline and scale it out, ideally without any code
> >     changes, *and* avoid the terrible XML config files.
> Any specifics of what to change here would be helpful.  UIMA-AS was designed to
> enable scale-out without changing the core UIMA pipeline or it's XML
> descriptor.  THe additional information for UIMA-AS scaleout was put into a
> separate xml descriptor which "embeds" the original plain UIMA one.

  I'm sure Richard would be able to explain this better, but I think one
of the core issues is that UIMA-AS embeds the XML descriptor instead of
the AnalysisEngineDescription.  So when I want to use it together with
AnalysisEngineDescription built with UIMAfit instead, it's time to
start making crazy workarounds like


> >   * Connected with the above - I'd love .addToIndexes() to just
> >     disappear.  Right now, the paradigm is that you build an annotation
> >     in an annotator, and the moment it gets saved in a CAS, it becomes
> >     basically read-only.  
> You certainly can modify any of an Annotation's features subsequently.
> I'm guessing you're referring to another idea - adding additional features that were
> not initially defined in the UIMA type system.

  Sorry for the confusion, but that's not quite what I had in mind.
I literally believe that right now, in order to modify value of
a feature, you need to first remove it from an index, change the
value, then re-add it back.  Is that a misconception?

> UIMA sets up the types and
> features once at the start of the pipeline run (from a merge of all the
> component's type systems), and locks down the type system.  Other frameworks
> sometimes allow an unlocked type system, where you could add (after a Feature
> Structure is created) additional features.  This is usually done by keeping a
> list of feature-name <-> feature-value pairs (such as your code snippet does,
> below).  We're thinking of including this capability in the version 3, with a
> bit of a twist - the intent would be to keep the "compilable" aspect of
> "locked-down" type/features (for high performance), while adding (for those use
> cases that want it) the other style of dynamically added additional features (at
> some cost in performance).  

  Still, this would be awesome and I'd totally make use of it!

  (The code in my original email I guess conflates demonstration of two
issues - the addToIndex and lack of variable-sized lists, i.e. the java
collection support issue.  Even if you decide generic collection / map
support would be too tricky, at least supporting variable-sized lists
would help a lot...)

> >   * I wondered about storing (arbitrary) graphs in the CAS, but the
> >     issues above make this really impractical.  If you also think about
> >     integrating microformats, you need to think about how to do this.
> We have had users store arbitrary graphs in the CAS, but, yes, it is not so
> efficient.  The main element UIMA has for collections of references (to
> FeatureStructures) are the FSArray and FSList.  As you point out the FSArray is
> fixed length.  The FSList supports dynamic adding/removing etc. using the
> standard link-list technology.  However, because UIMA data in the CAS
> (currently) is not garbage collected, you have to be careful when using this
> technique.

  ...oh, never mind.  After using UIMA heavily for well over a year,
I managed not to learn that FSList exists at all!  Thanks for this

  I think that's a bug for the UIMA Tutorial, which mentions FSArray but
not FSList.  :-)

  (Another pain point here - I always ache when I need to work with
FSArray or I guess FSList, since it does not carry the type information
that is in the typesystem - I need to manually typecast all the time
and hope I don't make a mistake.)

> The above proposal to allow the common Java Collection objects (like ArrayList,
> and Maps) as things in the CAS, plus garbage collection,should make it much more
> convenient to store and work with graphs in the CAS.
> >
> >   * Complex pipelines are a bit clumsy.  I think the biggest obvious
> >     problem is lack of signalling to CAS merger that input CASes have
> >     been exhausted.  Having an "isLast" barrier sounds simple as long
> >     as you have only a single CAS multiplier paired with the CAS merger,
> >     but when this assumption breaks down, things start to deteriorate.
> >     However, I realize complex pipelines are a niche area.
> It would be nice to hear some ideas here.

  (After reading Eddie Epstein's email and coming back to some more of
his emails to me, I realize that the isLast hack I'm using is needless
if I would instead use the "process-parent-last" flag of CASMultiplier.
I'm learning a lot from interacting here!  I guess that shows we could
always make use of more good UIMA code examples...)

				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

View raw message