oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Obtaining workflow instance unique ID
Date Sat, 30 Apr 2011 06:32:19 GMT
Hi Bruce,

Very sound thoughts (and Roy's TOIT paper is a great source of those types of quotes). Regarding
1, I think in the workflow sense, we've strived to at least maintain a clear, concise definition
of what we're trying to attack in [1]. If there are updates of understanding, etc., I think
we should localize them there and identify any gaps there first. Regarding #2, I'm totally
with you, which is why to use, identifiers are just metadata, that themselves can be transformed.
 Finally regarding 3, also in agreement, the WM has been designed to evolve as the data and
technology understanding evolves over time too.

Thanks, again for your comments.

Cheers,
Chris

[1] http://oodt.apache.org/components/maven/workflow/development/developer.html

On Apr 29, 2011, at 8:05 AM, Bruce Barkstrom wrote:

> Here's an additional train of thought about identifiers generally.
> 
> I was reading Fielding, R. T. and Taylor, R. N, 2002: Principled
> Design of the Modern Web Architecture, ACM Trans. Internet
> Technology, Vol. 2, 115-150, a fundamental statement that
> lays out the definition of REST.  They note
> 
> "Most software systems are created with the implicit assumption
> that the entire system is under the control of one entity, or at
> least that all entities participating within a system are acting
> towards a common goal and not at cross-purposes.  Such an
> assumption cannot be safely made when a system runs openly
> on the Internet.  Anarchic scalability refers to the need for architectural
> elements to continue operating under ananticipated load, or when
> given malformed or maliciously constructed data, since they
> may be communicating with elements outside their organizational
> control.
> ...
> Multiple organizational boundaries imply that multiple trust
> boundaries could be present in any communication.
> ...
> Multiple organizational boundaries also mean that the system
> must be prepared for gradual and fragmented change." [p. 119]
> 
> I think the notion in this quote also implies that one should expect
> separate vocabularies and naming conventions inside individual
> organizational boundaries.  As an example, I think it's pretty clear
> that the part identifiers used by Ford differ from those used by GM
> or Toyota - even for parts that probably fulfill the same functional
> requirements.
> 
> This suggests that each organization will have their own schemas
> for workflow identifiers or file identifiers - whether we like it or not.
> I suspect the impossibility of obtaining a single schema is clear,
> even with the rage for "standards".  Thus, it may be particularly
> important to try to design systems keeping three things in mind:
> 
> 1.  Systems designers need some way of organizing things, so
> they need clear definitions of the architectural structures (particularly
> the relationships between objects in the architecture) and they
> need to provide definitions that are understandable to people outside
> of the architecture design group.
> 
> 2.  Be prepared for multiple identifier schemas and for translators
> between them.  This may get interesting if the schemas make
> different organizational assumptions.  Putting this a bit more
> concretely, it would be sensible to assume that if there is a
> hierarchy, then there will be an alternative hierarchy that some
> other organization prefer.  For example, one group of librarians
> will use a Dewey Decimal hierarchy for classifying nonfiction, while
> another will use the Library of Congress classification scheme.
> Same objects - different classification.
> 
> 3.  Gradual and fragmented change also implies fragmented and
> gradual identifier schema change.  In one case, there are some
> data files that were stored on the old HPSS storage system with
> an identifier schema based on the assumption that magnetic tapes
> were stable and would be there foreever.  Of course, that classification
> and naming convention doesn't seem sensible now.  However, to migrate
> from the old convention appears expensive - and the people running
> the archive don't feel like they've got the time to design a new system.
> 
> Having said this, I don't think one wants to stop work - but new systems
> are not silver bullets that are going to transform the world without work.
> To put it somewhat facetiously "take one pill of clarity about current
> assumptions and then put your nose back on the grindstone".
> 
> Bruce B.
> On Thu, Apr 28, 2011 at 10:58 AM, Mattmann, Chris A (388J)
> <chris.a.mattmann@jpl.nasa.gov> wrote:
>> Hi Bruce,
>> 
>> Yep, I agree with your assessment of workflows. I am going to add a Wiki page containing
our thoughts on the workflow system that helped inform the design of the OODT workflow manager.
Paul Ramirez and I cooked it up in 2004-05, but haven't put it on the Apache site yet as we
are still transitioning some of the material/code.
>> 
>> We'll get it up though soon and the great thing is I think we're in alignment in
thinking. Thanks for your thoughts!
>> 
>> Cheers,
>> Chris
>> 
>> On Apr 26, 2011, at 4:47 AM, Bruce Barkstrom wrote:
>> 
>>> As an outside thought, you might want to think of a workflow as
>>> a kind of graph whose vertices are either processes or "files" (metadata
>>> perhaps being one kind of file - if you want to think of database transactions
>>> as "files", you can add those to the category of "file").  The edges in the
>>> graph are events.  Maybe you want to have an inventory of vertices or
>>> an inventory of edges.  If you want to think about a hierarchical graph,
>>> that might be useful as well, although the data structure and access
>>> methods might be a bit more complex.
>>> 
>>> If you want a graph (or collection of graphs - one graph per workflow),
>>> you might use hashing on the ID's to access either vertices or edges.
>>> 
>>> Interesting problem.
>>> 
>>> Bruce R. Barkstrom
>>> 
>>> On Mon, Apr 25, 2011 at 11:00 PM, Verma, Rishi (317I)
>>> <Rishi.Verma@jpl.nasa.gov> wrote:
>>>> Hi Paul,
>>>> 
>>>> Thanks a lot for your suggestions.
>>>> 
>>>> I agree, the second approach is probably the way to go. Although, would it
be more beneficial to save and return a single 'event' ID instead of multiple workflow instance
IDs? I'm under the assumption that workflow behaves something akin to a job submission system.
This assumption may be incorrect, but it might be beneficial to just get back a single unique
ID for the launch of a workflow top-level event as opposed to getting back a list of IDs...
>>>> 
>>>> So perhaps within XmlRpcWorkflowManager.handleEvent, we could generate and
save a unique ID for the single event? This ID could be queried by a client.
>>>> 
>>>> Thanks!
>>>> rishi
>>>> 
>>>> On Apr 25, 2011, at 1:25 PM, Ramirez, Paul M (388J) wrote:
>>>> 
>>>>> Hey All,
>>>>> 
>>>>> Thought this might be useful to the larger OODT community.
>>>>> 
>>>>> 
>>>>> 
>>>>> Hey Rishi,
>>>>> 
>>>>> I talked with Mike about this last week but never got back to him with
a concrete answer. However, after putting some more thought into it there are 2 approaches
I can think of.
>>>>> 
>>>>> First Approach:
>>>>> 1) Put some type of user generated id in the metadata object that gets
into the workflow instance metadata
>>>>> 2) Page through the lists of workflow instances that the workflow manager
has and identify those with your tag. Remember an event can kick off more than one workflow.
>>>>> 3) Any that have metadata that match your user generated id are your
workflow instances you're interested in.
>>>>> 
>>>>> Second Approach:
>>>>> 1) Add a new method to the client and server so that list of workflow
instance ids is returned
>>>>> 
>>>>> Looking at XmlRpcWorkflowManager.java
>>>>> 
>>>>> 
>>>>>   public boolean handleEvent(String eventName, Hashtable metadata)
>>>>>           throws RepositoryException, EngineException {
>>>>>       LOG.log(Level.INFO<http://Level.INFO/>, "WorkflowManager:
Received event: " + eventName);
>>>>> 
>>>>>       List workflows = null;
>>>>> 
>>>>>       try {
>>>>>           workflows = repo.getWorkflowsForEvent(eventName);
>>>>>       } catch (Exception e) {
>>>>>           e.printStackTrace();
>>>>>           throw new RepositoryException(
>>>>>                   "Exception getting workflows associated with event:
"
>>>>>                           + eventName + ": Message: " + e.getMessage());
>>>>>       }
>>>>> 
>>>>>       if (workflows != null) {
>>>>>           for (Iterator i = workflows.iterator(); i.hasNext();) {
>>>>>               Workflow w = (Workflow) i.next();
>>>>>               LOG.log(Level.INFO<http://Level.INFO/>, "WorkflowManager:
Workflow " + w.getName()
>>>>>                       + " retrieved for event " + eventName);
>>>>> 
>>>>>               Metadata m = new Metadata();
>>>>>               m.addMetadata(metadata);
>>>>> 
>>>>>               try {
>>>>>                   engine.startWorkflow(w, m); // This returns a workflow
instance which has an id that could be saved and returned
>>>>>               } catch (Exception e) {
>>>>>                   e.printStackTrace();
>>>>>                   throw new EngineException(
>>>>>                           "Engine exception when starting workflow: "
>>>>>                                   + w.getName() + ": Message: "
>>>>>                                   + e.getMessage());
>>>>>               }
>>>>>           }
>>>>>           return true;
>>>>>       } else
>>>>>           return false;
>>>>>   }
>>>>> 
>>>>> This should probably be a new method on both the client and server and
should be supplied as a patch. The method would return either List<String> or List<WorkflowInstance>
probably the latter but maybe someone else will chime in. Of course this would have to have
the accompanying client method but it shouldn't be that big of an update.
>>>>> 
>>>>> My instincts would say that second approach is probably the way to go.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Paul
>>>>> 
>>>>> 
>>>>> 
>>>>> On Apr 25, 2011, at 11:39 AM, Verma, Rishi (317I) wrote:
>>>>> 
>>>>> 
>>>>>  return ((Boolean) client
>>>>>                   .execute("workflowmgr.handleEvent", argList))
>>>>>                   .booleanValue();
>>>>> 
>>>>> Is there a way to obtain and query for a workflow 'instance ID'?
>>>>> 
>>>>> Thanks!
>>>>> Rishi & Mike
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message