uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petr Baudis <pa...@ucw.cz>
Subject Re: UIMAj3 ideas
Date Thu, 16 Jul 2015 16:52:41 GMT
On Fri, Jul 10, 2015 at 01:37:27PM -0400, Marshall Schor wrote:
> On 7/9/2015 6:52 PM, Petr Baudis wrote:
> <snip...>
> 
> https://cwiki.apache.org/confluence/display/UIMA/Ideas+for+UIMAJ+v3
> 
> >   I didn't figure out how to edit that wiki page, 
> Due to spammers, we had to turn off public editing.  However, I can add you to a
> list ( to do this, you have to "register" for a user id on the wiki, and then
> send me offline what that Id is ), but even without being on the list, there's a
> comment button which (I think) lets you add comments at the bottom.
> > but a mental summary
> > of the things I find currently irritating about UIMA and would love to
> > see changed formed in my mind, so I thought I could contribute it for
> > discussion.
> Great!
> >
> >   * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
> >     UIMA.  It seems to me that UIMA-AS is doing things a bit differently
> >     than what the original UIMA idea of doing scaleout was.  The two
> >     things don't play well together.  I'd love a way to easily take
> >     my plain UIMA pipeline and scale it out, ideally without any code
> >     changes, *and* avoid the terrible XML config files.
> Any specifics of what to change here would be helpful.  UIMA-AS was designed to
> enable scale-out without changing the core UIMA pipeline or it's XML
> descriptor.  THe additional information for UIMA-AS scaleout was put into a
> separate xml descriptor which "embeds" the original plain UIMA one.

  I'm sure Richard would be able to explain this better, but I think one
of the core issues is that UIMA-AS embeds the XML descriptor instead of
the AnalysisEngineDescription.  So when I want to use it together with
AnalysisEngineDescription built with UIMAfit instead, it's time to
start making crazy workarounds like

	https://code.google.com/p/dkpro-lab/source/browse/de.tudarmstadt.ukp.dkpro.lab/de.tudarmstadt.ukp.dkpro.lab.uima.engine.uimaas/src/main/java/de/tudarmstadt/ukp/dkpro/lab/uima/engine/uimaas/component/SimpleService.java?name=14aeba50c8c1&r=14aeba50c8c18ea4d14c0d099f43c049f806d9db

> >   * Connected with the above - I'd love .addToIndexes() to just
> >     disappear.  Right now, the paradigm is that you build an annotation
> >     in an annotator, and the moment it gets saved in a CAS, it becomes
> >     basically read-only.  
> You certainly can modify any of an Annotation's features subsequently.
> I'm guessing you're referring to another idea - adding additional features that were
> not initially defined in the UIMA type system.

  Sorry for the confusion, but that's not quite what I had in mind.
I literally believe that right now, in order to modify value of
a feature, you need to first remove it from an index, change the
value, then re-add it back.  Is that a misconception?

> UIMA sets up the types and
> features once at the start of the pipeline run (from a merge of all the
> component's type systems), and locks down the type system.  Other frameworks
> sometimes allow an unlocked type system, where you could add (after a Feature
> Structure is created) additional features.  This is usually done by keeping a
> list of feature-name <-> feature-value pairs (such as your code snippet does,
> below).  We're thinking of including this capability in the version 3, with a
> bit of a twist - the intent would be to keep the "compilable" aspect of
> "locked-down" type/features (for high performance), while adding (for those use
> cases that want it) the other style of dynamically added additional features (at
> some cost in performance).  

  Still, this would be awesome and I'd totally make use of it!

  (The code in my original email I guess conflates demonstration of two
issues - the addToIndex and lack of variable-sized lists, i.e. the java
collection support issue.  Even if you decide generic collection / map
support would be too tricky, at least supporting variable-sized lists
would help a lot...)

> >   * I wondered about storing (arbitrary) graphs in the CAS, but the
> >     issues above make this really impractical.  If you also think about
> >     integrating microformats, you need to think about how to do this.
> We have had users store arbitrary graphs in the CAS, but, yes, it is not so
> efficient.  The main element UIMA has for collections of references (to
> FeatureStructures) are the FSArray and FSList.  As you point out the FSArray is
> fixed length.  The FSList supports dynamic adding/removing etc. using the
> standard link-list technology.  However, because UIMA data in the CAS
> (currently) is not garbage collected, you have to be careful when using this
> technique.

  ...oh, never mind.  After using UIMA heavily for well over a year,
I managed not to learn that FSList exists at all!  Thanks for this
pointer.

  I think that's a bug for the UIMA Tutorial, which mentions FSArray but
not FSList.  :-)

  (Another pain point here - I always ache when I need to work with
FSArray or I guess FSList, since it does not carry the type information
that is in the typesystem - I need to manually typecast all the time
and hope I don't make a mistake.)

> The above proposal to allow the common Java Collection objects (like ArrayList,
> and Maps) as things in the CAS, plus garbage collection,should make it much more
> convenient to store and work with graphs in the CAS.
> >
> >   * Complex pipelines are a bit clumsy.  I think the biggest obvious
> >     problem is lack of signalling to CAS merger that input CASes have
> >     been exhausted.  Having an "isLast" barrier sounds simple as long
> >     as you have only a single CAS multiplier paired with the CAS merger,
> >     but when this assumption breaks down, things start to deteriorate.
> >     However, I realize complex pipelines are a niche area.
> It would be nice to hear some ideas here.

  (After reading Eddie Epstein's email and coming back to some more of
his emails to me, I realize that the isLast hack I'm using is needless
if I would instead use the "process-parent-last" flag of CASMultiplier.
I'm learning a lot from interacting here!  I guess that shows we could
always make use of more good UIMA code examples...)

-- 
				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

Mime
View raw message