oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Barkstrom <brbarkst...@gmail.com>
Subject Re: Obtaining workflow instance unique ID
Date Fri, 29 Apr 2011 15:05:48 GMT
Here's an additional train of thought about identifiers generally.

I was reading Fielding, R. T. and Taylor, R. N, 2002: Principled
Design of the Modern Web Architecture, ACM Trans. Internet
Technology, Vol. 2, 115-150, a fundamental statement that
lays out the definition of REST.  They note

"Most software systems are created with the implicit assumption
that the entire system is under the control of one entity, or at
least that all entities participating within a system are acting
towards a common goal and not at cross-purposes.  Such an
assumption cannot be safely made when a system runs openly
on the Internet.  Anarchic scalability refers to the need for architectural
elements to continue operating under ananticipated load, or when
given malformed or maliciously constructed data, since they
may be communicating with elements outside their organizational
control.
...
Multiple organizational boundaries imply that multiple trust
boundaries could be present in any communication.
...
Multiple organizational boundaries also mean that the system
must be prepared for gradual and fragmented change." [p. 119]

I think the notion in this quote also implies that one should expect
separate vocabularies and naming conventions inside individual
organizational boundaries.  As an example, I think it's pretty clear
that the part identifiers used by Ford differ from those used by GM
or Toyota - even for parts that probably fulfill the same functional
requirements.

This suggests that each organization will have their own schemas
for workflow identifiers or file identifiers - whether we like it or not.
I suspect the impossibility of obtaining a single schema is clear,
even with the rage for "standards".  Thus, it may be particularly
important to try to design systems keeping three things in mind:

1.  Systems designers need some way of organizing things, so
they need clear definitions of the architectural structures (particularly
the relationships between objects in the architecture) and they
need to provide definitions that are understandable to people outside
of the architecture design group.

2.  Be prepared for multiple identifier schemas and for translators
between them.  This may get interesting if the schemas make
different organizational assumptions.  Putting this a bit more
concretely, it would be sensible to assume that if there is a
hierarchy, then there will be an alternative hierarchy that some
other organization prefer.  For example, one group of librarians
will use a Dewey Decimal hierarchy for classifying nonfiction, while
another will use the Library of Congress classification scheme.
Same objects - different classification.

3.  Gradual and fragmented change also implies fragmented and
gradual identifier schema change.  In one case, there are some
data files that were stored on the old HPSS storage system with
an identifier schema based on the assumption that magnetic tapes
were stable and would be there foreever.  Of course, that classification
and naming convention doesn't seem sensible now.  However, to migrate
from the old convention appears expensive - and the people running
the archive don't feel like they've got the time to design a new system.

Having said this, I don't think one wants to stop work - but new systems
are not silver bullets that are going to transform the world without work.
To put it somewhat facetiously "take one pill of clarity about current
assumptions and then put your nose back on the grindstone".

Bruce B.
On Thu, Apr 28, 2011 at 10:58 AM, Mattmann, Chris A (388J)
<chris.a.mattmann@jpl.nasa.gov> wrote:
> Hi Bruce,
>
> Yep, I agree with your assessment of workflows. I am going to add a Wiki page containing
our thoughts on the workflow system that helped inform the design of the OODT workflow manager.
Paul Ramirez and I cooked it up in 2004-05, but haven't put it on the Apache site yet as we
are still transitioning some of the material/code.
>
> We'll get it up though soon and the great thing is I think we're in alignment in thinking.
Thanks for your thoughts!
>
> Cheers,
> Chris
>
> On Apr 26, 2011, at 4:47 AM, Bruce Barkstrom wrote:
>
>> As an outside thought, you might want to think of a workflow as
>> a kind of graph whose vertices are either processes or "files" (metadata
>> perhaps being one kind of file - if you want to think of database transactions
>> as "files", you can add those to the category of "file").  The edges in the
>> graph are events.  Maybe you want to have an inventory of vertices or
>> an inventory of edges.  If you want to think about a hierarchical graph,
>> that might be useful as well, although the data structure and access
>> methods might be a bit more complex.
>>
>> If you want a graph (or collection of graphs - one graph per workflow),
>> you might use hashing on the ID's to access either vertices or edges.
>>
>> Interesting problem.
>>
>> Bruce R. Barkstrom
>>
>> On Mon, Apr 25, 2011 at 11:00 PM, Verma, Rishi (317I)
>> <Rishi.Verma@jpl.nasa.gov> wrote:
>>> Hi Paul,
>>>
>>> Thanks a lot for your suggestions.
>>>
>>> I agree, the second approach is probably the way to go. Although, would it be
more beneficial to save and return a single 'event' ID instead of multiple workflow instance
IDs? I'm under the assumption that workflow behaves something akin to a job submission system.
This assumption may be incorrect, but it might be beneficial to just get back a single unique
ID for the launch of a workflow top-level event as opposed to getting back a list of IDs...
>>>
>>> So perhaps within XmlRpcWorkflowManager.handleEvent, we could generate and save
a unique ID for the single event? This ID could be queried by a client.
>>>
>>> Thanks!
>>> rishi
>>>
>>> On Apr 25, 2011, at 1:25 PM, Ramirez, Paul M (388J) wrote:
>>>
>>>> Hey All,
>>>>
>>>> Thought this might be useful to the larger OODT community.
>>>>
>>>>
>>>>
>>>> Hey Rishi,
>>>>
>>>> I talked with Mike about this last week but never got back to him with a
concrete answer. However, after putting some more thought into it there are 2 approaches I
can think of.
>>>>
>>>> First Approach:
>>>> 1) Put some type of user generated id in the metadata object that gets into
the workflow instance metadata
>>>> 2) Page through the lists of workflow instances that the workflow manager
has and identify those with your tag. Remember an event can kick off more than one workflow.
>>>> 3) Any that have metadata that match your user generated id are your workflow
instances you're interested in.
>>>>
>>>> Second Approach:
>>>> 1) Add a new method to the client and server so that list of workflow instance
ids is returned
>>>>
>>>> Looking at XmlRpcWorkflowManager.java
>>>>
>>>>
>>>>    public boolean handleEvent(String eventName, Hashtable metadata)
>>>>            throws RepositoryException, EngineException {
>>>>        LOG.log(Level.INFO<http://Level.INFO/>, "WorkflowManager:
Received event: " + eventName);
>>>>
>>>>        List workflows = null;
>>>>
>>>>        try {
>>>>            workflows = repo.getWorkflowsForEvent(eventName);
>>>>        } catch (Exception e) {
>>>>            e.printStackTrace();
>>>>            throw new RepositoryException(
>>>>                    "Exception getting workflows associated with
event: "
>>>>                            + eventName + ": Message: " + e.getMessage());
>>>>        }
>>>>
>>>>        if (workflows != null) {
>>>>            for (Iterator i = workflows.iterator(); i.hasNext();) {
>>>>                Workflow w = (Workflow) i.next();
>>>>                LOG.log(Level.INFO<http://Level.INFO/>, "WorkflowManager:
Workflow " + w.getName()
>>>>                        + " retrieved for event " + eventName);
>>>>
>>>>                Metadata m = new Metadata();
>>>>                m.addMetadata(metadata);
>>>>
>>>>                try {
>>>>                    engine.startWorkflow(w, m); // This returns
a workflow instance which has an id that could be saved and returned
>>>>                } catch (Exception e) {
>>>>                    e.printStackTrace();
>>>>                    throw new EngineException(
>>>>                            "Engine exception when starting
workflow: "
>>>>                                    + w.getName() + ": Message:
"
>>>>                                    + e.getMessage());
>>>>                }
>>>>            }
>>>>            return true;
>>>>        } else
>>>>            return false;
>>>>    }
>>>>
>>>> This should probably be a new method on both the client and server and should
be supplied as a patch. The method would return either List<String> or List<WorkflowInstance>
probably the latter but maybe someone else will chime in. Of course this would have to have
the accompanying client method but it shouldn't be that big of an update.
>>>>
>>>> My instincts would say that second approach is probably the way to go.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Paul
>>>>
>>>>
>>>>
>>>> On Apr 25, 2011, at 11:39 AM, Verma, Rishi (317I) wrote:
>>>>
>>>>
>>>>   return ((Boolean) client
>>>>                    .execute("workflowmgr.handleEvent", argList))
>>>>                    .booleanValue();
>>>>
>>>> Is there a way to obtain and query for a workflow 'instance ID'?
>>>>
>>>> Thanks!
>>>> Rishi & Mike
>>>>
>>>>
>>>
>>>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Mime
View raw message