cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hunsberger, Peter" <Peter.Hunsber...@stjude.org>
Subject RE: Experience with workflow at Hippo Webworks
Date Tue, 09 Mar 2004 15:43:53 GMT
Johan Stuyts <johan@hippo.nl> writes:

<snip/>

> >>>> Each document attached to workflow would need a workflow 
> instance 
> >>>> as long as the document lives (from creation to deletion). This 
> >>>> would mean the continuation stack of every document needs to be 
> >>>> persisted to - well
> >>>> - to a database if you don't want to limit your 
> clustering options. 
> >>>> The
> >>>> document has a property holding the continuation ID.
> >>>
> >>>
> >>>
> >>> You have a point here, Guido. It is true that continuations in a
> >>> distributed environment would need to be made custer-friendly and
> >>> replicated. This would probably impact the overall 
> performance... but
> >>> keep in mind that continuations are just another way to 
> save state.
> >>> That kind of state transformation (think REST) will have 
> to be done
> >>> anyway.
> >>
> 
> I thought about this too and think that the current workflow 
> should be 
> accessible without the user holding a special token. If I 
> have a tree of 
> documents I should be able to invoke any active command on a 
> document in 
> that tree.

Absolutely.  In particular, you have to be able to dynamically revise a
workflow across browser sessions.  The original receiver may loose an
authorization (or someone else may gain one), a new team member might
join a processing queue, etc. etc.  When resuming a work flow you have
to be able to evaluate many sets of conditions to determine where to
pick up from. A simple procedural continuation, although conceptually
capable of doing this, could be extremely complex: if coded as a state
machine (more on that later), it would be essentially the equivalent of
having to go back and reevaluate all the steps that led to the current
state with the extra knowledge of knowing that you've reached the
current state added in on top: if there are N steps in your work flow
you can need up to N! possible state evaluations. If coded as a
procedural script you essentially need up to N! "if" statements. 

In practice the limit is less than N! because people exploit special
knowledge of a given work flow, but when writing a generic handler for
all possible work flows you would not have this special knowledge and
would not be able to guarantee any kind of scalable performance.

> >>
> >> The lifetime of a workflow instance also makes a big difference
> >> regarding business process changes. I experienced in a 
> previous job in a
> >> large company some workflows (e.g. purchase orders) who 
> changed several
> >> times a year, depending on structural changes in the company's
> >> organization. And already running instances had to adapt to the new
> >> worflow from the state they were in at the date of workflow change.
> >>
> >> This makes continuations technically not suitable for 
> this, as once a
> >> continuation has been created, it's tied to the program's structure
> >> where it was created, and it cannot be "rerouted" to a location in
> >> another version of the same program.
> >>
> 
> This won't be easy in state machines either, but I think it 
> is possible if 
> you store state paths. When the workflow instance is read by a new 
> workflow definition, the definition matches the paths to his states.

State machines are powerful, but the most general form is not Turing
complete (they are a special case of a Turing machine).  Adding "state
paths" is essentially a way of coding the N! permutations of how you got
to some point: you can code the knowledge in state evaluations or in
complex state data, but it's the same thing in the end, you have to
evaluate many paths to resume the work flow.
 
For generalized work flow you need something that is Turing complete in
all senses.  In particular, you need the ability to go backwards, and
you need the ability to "write" to the tape (in a Turing sense).
Generalized state machines are one directional and read only.

> >> Furthermore, there exists what is called "ad-hoc" workflows, where
a
> >> user, depending on its rights, can modify the workflow for 
> a particular
> >> instance. An example of this is when a document has to be 
> published that
> >> contains highly technical material. The editor may want to add an
> >> additional step in the workflow for the document to be 
> validated by a
> >> technical expert, which doesn't happen in normal cases.
> >>
> >> In that situation, asking a user to write a new version of 
> a program for
> >> that specific need doesn't seem a good solution to me ;-)
> >
> > Wait a second: to you think that guy would be more confortable in
> > writing a FSM code?
> >
> 
> I think the option to request review by a technical expert should be 
> programmed explicitly by the developers too. Instead of just 
> 'publish' and 
> 'disapprove', the editor can also invoke 'request technical review'.

The main point here is very important: you have to support dynamic
resumption.  This isn't something that state machines are good at!
State machines are good at atomic operations, things like parsing data
packets (decoding protocols).  It is, IMO, a mistake to try and use
state machines for a generalized way to handle work flows, you need open
ended rule evaluation.  There are some special types of state machines
that match up with rule evaluation in certain contexts.  I don't know
enough about them to know if they are applicable to work flows.  My gut
feeling is that you'd be better off starting from a pure rule based
architecture (see below).

> > Let's compare apples to apples here: we are not discussing how the
> > workflows should be edited, but how they are going to 
> impact our system
> > and how we are going to build them.
> >
> > there are several solution on the table and at least two 
> architecturally
> > orthogonal questions:
> >
> >   1) should the workflow engine have direct data control?
> 
> For me the data should reside in the document/object to which 
> the workflow 
> instance :) is attached. The only information stored in a workflow 
> instance is (possibly) an identifier which can be used to locate the 
> document/object.
> 
> The actual implementation of the conditions and the actions that get 
> executed on certain events should not be in the workflow 
> definition, but 
> should be separate. The workflow definition only references 
> these separate 
> implementations. The implementations get passed the identifier of the 
> document/object when invoked, so they can retrieve the 
> document/object to 
> do their work.
> 
> >   2) should the workflow engine deal with procedural 
> scripts or finite
> > state machines directly?
> >
> 
> State machine junkie talking: state machines. I see state 
> machines as a 
> different way of programming than procedural, and think that 
> coding them 
> using procedural code will be more difficult. The conditions 
> and actions 
> which connect the workflow instance to the environment it is 
> running in 
> are procedural, and that's why I program these in Java (in our demo).

Third option: the work flow should deal with declarative rule
evaluation.  Declarative rules can be generated on the fly in an
automated fashion.  More-over Cocoon already has all the machinery in
place that is needed: XSLT and XML make an excellent frame work for rule
evaluation.  

The rules are coded as XSLT templates, the current rule evaluation
context is the XML fed into the XSLT. Some of this context is stored
with the document (any context specific to the document) and some of it
is generated as needed for each step in the work flow, for example, the
sets of authorizations for each potential reviewer for document type X.
Aggregated Cocoon generators create the combined contexts to feed into
the rule evaluation process.  

Having something as "simple" as URL be the sole end point of a XSLT
might seem strange, but note that the XSLT doesn't have to produce just
a single URI, it could in fact generate an entire tree of Cocoon flow
steps that need to be completed for a given work flow step.  

For example, it might say that for step N of the work flow to complete,
there are three browser pages that have to be completed successfully.
These three pages are then handled by regular Cocoon flow (maybe via
dynamically produced flow script, generated form your XSLT, instead of a
single URL) and as the final step, the work flow evaluation process is
fired up again (if the user is in a state where that is appropriate).

> > my take would be
> >
> >   1) no, it should be saparate, sort of a process knowledge 
> base that
> > the flow logic interrogates when it need to
> >
> >   2) procedural scripts: they are always easier to program
> >
> > but, there are valid and solid arguments to make me change 
> my mind on 2)
> > so I think it's better to explore whatever solution makes more sense
> > right now and expand for that, instead of spending too much 
> time on the
> > whiteboard without getting anything out of the door.
> >

A good implementation of work flow handling for Cocoon could be the most
important piece of missing capability that can be added.  For the most
part good work flow engines are expensive proprietary pieces of
software.  If a generalized, open source, document handling framework
included a 90% solution for work flow it could truly revolutionize many,
many portions of the IT industry.  This isn't something that you just
want to slap together.  

If you want, push a prototype out the door, but if you do so, label it
that.  Moreover, I would only do so with the explicit intention of
destroying it in the future.  Work flow is just too important to
implement incorrectly.


Mime
View raw message