oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Barkstrom <brbarkst...@gmail.com>
Subject Some Further Thoughts on Configuration Management of Complex Workflows
Date Tue, 23 Sep 2014 13:16:13 GMT
While I won't claim to have done a thorough examination of the proposal
to use the IBM tool for developing workflows, I am concerned about several
items relating to configuration management.  There are several articles
in the new Comm. ACM that bear on security and configuration management.
(CACM, Vol. 57, No. 9, 2014).  I'd highly recommend getting a copy and
taking a look at the articles in the middle of the issue.

1.  Kern, C., 2014: Securing the Tangled Web, CACM, 57, 38-67 presents
a view of security issues due to script injection vulnerabilities that makes
JSON and other technologies that use Java Script less secure than one
would like.  Kern is an information security engineer for Google.  He
not only the nature of the XSS vulnerabilities, but also work Google has
to reduce their risk.  These include building in special character
exception handling,
designing and testing automated templates for interface designers to use,
project management enforcement of strict disciplines that forbid use of
vulnerable software.  Unfortunately, the cures add to the leaning curve for
using these tools - and increase the maintenance cost of software because
they need to be applied "forever".

2.  Workflows (or, in normal project management nomenclature, Work Breakdown
Structures) are graphs whose complexity increases markedly as more
and objects get included.   If one is aiming for high integrity or fully
replicable and
transparent software systems, one must maintain the ability to retain
The old NCAR FORTRAN manuals (ca. 1980) had a cover that embedded the
notion "It ran yesterday.  It's been running for years.  I only changed one
This means that software that is updated (by revisions due to concerns over
security or to make other improvements) could require verification that the
updates haven't changed numerical values.  Based on my personal experience
with Ubuntu Linux (or Windows - whatever), updates occur on at least a
basis, with the organizations responsible for their software deciding when
send out updates.  This rate of update makes the Web a pretty volatile
In most organizations that have system administrators, they bear the burden
turmoil creates.  End users may not realize the impact, but it costs time
and attention
to avoid being overwhelmed.

3.  In many of the software packages we use, the organization providing the
manages package updates with a centralized package manager.  In Linux,
Debian (and
the derivative Ubuntu family of software) uses one centralized manager to
produce the
.deb packages that contain appropriate provenance metadata for maintaining
Red Hat and SuSE Linux use an alternative format for the RPM package with
its metadata
format.  These package managers do not operate in the same way.  For
example, if
you want to ingest RPM packages into Ubuntu, you have to install a package
alien and use that to convert the RPM to .deb formats.  The same
pleasantries affect
Java, databases, and Web standards.  Because some of these organizations
are real
commercial enterprises making their money from customers outside of the
contracting venue, it seems unlikely that expecting funding agencies will
one common standard for configuration management.  While funding agencies
think a single standard for configuration would solve their problems, that
would require
an unprecedented degree of cooperation between agencies, data producers,
and data
users.  The time scale for reaching agreements on this kind of "social
is almost certainly at least a decade, during which the technological basis
in hardware
and software will have evolved out from under the agencies.

I suspect that the security issues relating to JSON and such are the
concern.  On a slightly longer time frame, it's important to remember the
of workflow scaling makes a single tool unlikely.  A solution for data
production with
short chains of objects that are relatively isolated (single investigator
conducting a
few investigations per year) is vastly different from production flows such
as weather
forecasting or some kinds of climate data production (large teams of
software developers and
scientists (100's of people) running 1000's of jobs per day).
Configuration management
for the latter kinds of project requires building group cultures that
recognize the importance
of managing the configuration - and does take up a lot of time - even for
the scientists

I won't say I'm sorry for the length of the comments.  Some issues can't be
to sound bites or bullets.  The chain of reasoning for these issues seems

Bruce B.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message