Mailing-List: contact user-help@uima.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@uima.apache.org
Received-SPF: pass (athena.apache.org: message received from 54.164.171.186
 which is an MX secondary for user@uima.apache.org)
From: Mario Gazzo <mario.gazzo@gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Error handling in flow control
Message-Id: <7FEB93E3-FB74-4F02-90FE-C6C9E885DF40@gmail.com>
Date: Fri, 24 Apr 2015 11:37:35 +0200
To: user@uima.apache.org
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\))

I am trying to get error handling to work with a custom flow control. I =
need to send status information back to a service after the flow =
completed either with or without errors but I can only do this once for =
any workflow item because it changes the state of the job, at least =
without error replies and wasteful requests. The problem is that I need =
to do several retries before finally failing and reporting the status to =
a service. First I tried to let the CPE do the retry for me by setting =
the max error count but then a new flow object is created every time and =
I loose track of the number of retries before this. This means that I =
don=E2=80=99t know when to report the status to the service because it =
should only happen after the final retry.

I then tried to let the flow instance manage the retries by moving back =
to the previous step again but then I get the error =
=E2=80=9Corg.apache.uima.cas.CASRuntimeException: Data for Sofa feature =
setLocalSofaData() has already been set=E2=80=9D, which is because the =
document text is set in this particular test case. I then also tried to =
reset the CAS completely before retrying the pipeline from scratch and =
this of course throws the error =E2=80=9CCASAdminException: Can't flush =
CAS, flushing is disabled.=E2=80=9D. It would be less wasteful if only =
the failed step is retried instead of the whole pipeline but this =
requires clean up, which in some cases might be impossible. It appears =
that managing errors can be rather complex because the CAS can be in an =
unknown state and an analysis engine operation is not idempotent. I =
probably need to start the whole pipeline from the start if I want more =
than a single attempt, which gets me back to the problem of tracking the =
number of attempts before reporting back to the service.

Does anyone have any good suggestion on how to do this in UIMA e.g. =
passing state information from a failed flow to the next flow attempt?