uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petr Baudis <pa...@ucw.cz>
Subject CAS merger/multiplier N:M mapping
Date Sun, 06 Sep 2015 14:11:24 GMT

  I'm currently struggling to perform a complex flow transformation with
UIMA.  I have multiple (N) CASes with some fulltext search results.
I chop these search results to sentences and would like to pick the top
M sentences from the search results collected and build CASes from them
to do further analysis.  So, I'd like to copy subsets (document text
wise and annotation wise) of N input CASes to M output CASes.  I don't
know how to do this technically.  I tried two non-workable ideas so far:

  (i) Keep around references to the respective views of input CASes
and use them as CasCopier sources when the time comes to produce
the new CASes.  Turns out the input CASes are (unsurprisingly) recycled
and the references I kept around at process() time aren't valid when
next() is called much later.

  (ii) Use an internal "intermediary" CAS instance in process() to which
I append my sentences, then use it as a source of output CASes.  Turns
out (surprisingly) that I can't append to a sofa documenttext ("Data for
Sofa feature setLocalSofaData() has already been set." - not sure about
the reason for this restriction).

  I think the only choice except downright unmaintainable hacks (like
programatically generated M views) is to just give up on preserving my
annotations and carry over just the sentence texts.  Am I missing

  (I'm somewhat tempted to cut my losses short (much too late) and
abandon UIMA flow control altogether, using only simple pipelines and
having custom glue code to connect these together, as it seems like
getting the flow to work in interesting cases is a huge time sink and in
retrospect, it could never pay off any abstract advantage of easier
distributed processing (where you probably end up having to chop up the
pipeline manually anyway).  I would probably never recommend new UIMA
users to strive for a single pipeline with CAS multipliers/mergers and
begin to consider these features an evolutionary dead end rather than
advantageous.  Not sure if there even *are* any other real users using
advanced flows besides me and DeepQA.  I'll be glad to hear any opinions
on this!)

				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

View raw message