gump-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Simons <m...@leosimons.com>
Subject Re: Gump3 ideas/questions
Date Wed, 18 May 2005 19:55:12 GMT
Adam Jack wrote:
> Leo (primarily)

oi!

> Relaxed and refreshed after a week on the beach, I'm putting other concerns
> aside and am taking a day for me, to tinker w/ Gump3. So, I'm digging into
> the fresh code (and still consuming the rest of it.) Hopefully after today
> I'll be "in the zone" w/ this stuff, and better able to dig in at will.

:-)

> Interestingly, I find I'm going through some of the same 'groking pains' as
> folks have felt with Gump2. (Kinda nice to be on the other side of the fence
> for a change, getting to see inside somebody else's head, and liking the
> view.) Despite this being 100% crisper, I still have some question...

kewl!

> I think in these (early days, maybe through-out the life of Gump3) the
> toughest part is the "property messaging protocol(s)". Basically Gump3
> internals are based upon trust & respect between plugins -- like a good OSS
> project :-) -- but it doesn't (yet) have the internal equivalents of
> communications forums of wikis/blogs/docs, nor the history of commit mails,
> mail archives, nor SCMs.

I love that description. Nope, we don't have that. But we can reuse all
those tools just by localizing the trust and respect into an explicit
file! We need

 gump/model/propertymessaging.py

"This module contains *all* the getters/setters that plugins use for
adding data to the model or getting it from the model."

docs, you get as follows:

  $ python
  >>> from gump.model import propertymessaging
  >>> dir(propertymessaging)
  ['mark_failure', 'check_failure', 'set_build_log', 'get_build_log',
   ...]
  >>> for attname in dir(propertymessaging):
  ...   print attname
  ...   print "----------------------"
  ...   print getattr(propertymessaging,attname).__doc__

We just write a little bit of python code to do the above :-)

We get SCM/commit mails you get by the file being in SVN. We also get
versioning of the protocol that way: just tag revisions of the file:

  svn cp
https://svn.apache.org/repos/asf/gump/branches/Gump3/pygump/gump/model/propertymessaging.py
https://svn.apache.org/repos/asf/gump/tags/Gump3-propertymessage-protocol-3.0

> My main interest in Gump3's approach is if this can
> fly, and scale. I feel yes, but I think we need tools (docs being just one).
> I'd like to see how/if those tools fit.

Its nice if tools are smart so you don't have to worry about this kind
of stuff most of the time. I have no idea how it fits in, but the
conflict resolution between things that the svn client uses (you get a

C myfile

report, then myfile.mine, myfile.r123, myfile.r127, and a "myfile" which
has the cvs-style conflict markers) is such an example. You allow
conflicts to develop but make them trivial to find and fix.

> For example, [Question] I'd like to
> know how a module/project status/failure/cause protocol gets agreed between
> builder plugins and (core?) build algorithms? How do I dig into it?

I've actually been changing that a little (make sure you svn upped
everything :-D). From memory,

Anything that has side effects (ie a plugin that handles a Command or
plugins for updating) can fail in two ways:

  -> expected (build failure, cvs auth failure)
  -> unexpected (no build failure but a jar is missing, out of disk
     space)

Plugins should make an attempt to handle an expected failure, ie they
should try and continue processing, and set the "failed" property on the
model element that failed. They should raise exceptions on unexpected
failure. They don't do anything else.

The Builder nor the Walker nor any of the "core" code have anything to
do with this, *except* the algorithm (in algorithm.py). All decisions
about what should happen on a failure, all intelligence on what a
failure *means* or what *consequences* a failure should have (for
example, if you can't check out a module no use trying to build it) are
made by the algorithm.

So, for example,

# pseudocode from memory
# A(script-based project) --depends--> B(ant-based project)
a = Project("A")
b = Project("B")
d = Dependency(dependency=a, dependee=b)
b.add_dependency(b)

a_c = Script("build")
a.add_command(a_c)

b_c = Ant()
b.add_command(b_c)

Possible program flow inside the "main stage":

  1) the core engine tells the algorithm to visit a
  2) all plugins visit a
     2.1) the ScriptBuilderPlugin visits a
     2.2) the script exits with status code 1
     2.3) the ScriptBuilderPlugin marks a_c as "failed"
  3) the algorithm marks a as failed indicating a_c as the cause
  4) the algorithm marks b as failed indicating a as the cause

You can also imagine using a dumb algorithm:

  1) the core engine tells the algorithm to visit a
  2) all plugins visit a
     2.1) the ScriptBuilderPlugin visits a
       2.1.1) the script exits with status code 1
       2.1.2) the ScriptBuilderPlugin marks a_c as "failed"
  3) all plugins visit b
     2.1) the AntBuilderPlugin visits b
       2.1.1) ant exits with status code 1
       2.1.2) the AntBuilderPlugin marks b_c as "failed"

A good way to dig into this might be to change the algorithm used for
the "main" stage. Just copy-paste the code inside algorithm.py into a
new algorithm (call it MyExperimentalAlgorithm or whatever). Comment out
a few lines to make it simpler ("dumber"). Change config.py to use your
new algorithm. Rerun the sample data. See what changes result. Comment
out code in a builder plugin that raises an exception or marks a
failure. See what happens.

(yes, yes, yes, we need unit tests of this stuff. I got impatient, and
the lack of tool support is making me unhappy at times...I miss my
"play" button in the IDE)

> For futures, what is the best way to "manage" that protocol, keeping it
> clean/healthy -- and ensure we don't leak too much "plugin stuff into the
> core" w/o noticing.

I'm not too worried about that you know. Its easy to detect if you make
sure to only input what you need. gump.plugins should not import
anything from gump.engine. Where they share logic it ends up in
gump.model.util. Keep gump.model.util simple and small with easily
understood utility functions and you're not polluting the core.

> As I see the code base the "theoretical aspects" are
> coming to a close as more and more "practicalities" come into play.

:-). Well, there's a few big theoretical aspects I really really want to
tackle. The most important one to get right around now is the "use the
last successful build". Its a really big advantage, and I've spent a lot
of effort in the design trying to make it feasible. I've been working on
that and thinking about it (I'm not sure if I committed anything yet).

> Basically, on Cygwin, I see CVS updates for Ant occurring and the
> bootstrap-ant script being called (albeit, in my environment, failing). I
> want to get past there -- wiring in the Java Ant builder I've written, but
> I'd also not like to miss out on a few more theoretical parts. I'd like to
> see if there are ideas for managing these property messages before we write
> much more.

Well, one thing that I've tried to do recently is to encapsulate
property getting/setting into functions inside gump.model.util. Ie you
don't do

  project.failed = True

but you do

  mark_failure(project)

One of the good things is that you can look up the docstring for
mark_failure. Its a good place to keep the documentation, and it makes
the code more readable.

I think I also have other ideas on this I'm just not sure what they are.
Maybe, if you start writing code, I'll think "I'd do that differently"
and the ideas fall out (or the other way around, you write some code and
think "this doesn't work well" and the ensuing discussion leads to the
idea). So far I'm simply not too fussed about it.

> I feel that as we develop more and more plugins I'd like to consider how we
> 'keep track' of the interactions, perhaps even the deltas or what properties
> a plugin creates and/or tweaks, and perhaps even attribute ownership.
> Perhaps have a known dictionary (on model objects) of plugins taking credit
> for their efforts.

Hmm. Like you said above, a large part of the system is based on "trust"
between plugins. We need to emphasize that. I'm sure the answer is in there.

This also seems like the perfect place to leverage python's
metaprogramming facilities (ie, in java, I'd use some AOP to detect
ownership rather than having to declare it). I don't know how. It's a
hunch: use a little magic instead of having to do lots of declarations :-)

> I'm not trying to slow down progress, just curious about
> exploring ideas to make plugin communications if not deterministic,
> "trackable" & perhaps self-documenting.

Please, go and explore! Not that many ideas popping into my head here,
but that may also be because I don't see the problem clearly enough yet.

> Also, I suspect that in the real world (the full Gump run) we'll want a
> Reaper plugin, that trims properties, perhaps based off a reap_me
> dictionary.

I'll be frank: I don't think so. I think that if properties become too
big, they should be turned into functionality to access properties. For
example, instead of keeping logs in the model, you store them on disk or
in a database, and you load them into the plugin space (instead of onto
the model) as you need them:

# pseudocode
class MyReporter:
  def visit_project(p):
    log = p.buildlog
    write_to_html_in_some_way(log)

class MyLogGenerator:
  def visit_project(p):
    log = do_something_to_generate_log(p)
    logkey = "%s_build_log.txt" % p.name

    my_store.write(logkey, log)
    smart_property_decorator(p, "buildlog",
      lamda: return my_store.get(logkey))

# with the smart magic here:
def smart_property_decorator(p, propertyname, accessor_func):
  prepare_smart_property_stuff(p)

  p.__accessors__[propertyname] = accessor_func
  delattr(p, propertyname)

def prepare_smart_property_stuff(p):
  if hasattr(p, "__accessors__"):
    return

  p.__accessors__ = {}
  p.__getattr__ = smart_getattr


def smart_getattr(self,attname):
  return self.__accessors__[attname]()

Hmm. While typing that, I realized that you can actually build a reaper
that way, and its just a plugin that looks for the "big" chunks of
memory, writes them to your object db of choice (ie python's "shelf"
module or whatever), then adds an accessor to load them back into memory.

#pseudocode

# move elsewhere of course
import shelve
shelf = shelve.open("mystorage")

# very naive!
unique_id = 0
def get_unique_id():
  unique_id += 1
  return unique_id

max_mem_string_size = 100

# the plugin
class MyMemoryConserver(AbstractPlugin):
  # and visit_module and visit_other_stuff of course

  def visit_project(self, p):
    for attname in dir(p):
      att = getattr(p, attname)

      if isinstance(att, str):
        size = len(str)
        if size > max_mem_string_size:
          move_to_shelf(p, attname)

# help functions
def move_to_shelf(p, attname):
  prepare_for_shelving(p)

  value = getattr(p, attname)
  d[get_attribute_id(p, attname)] = value
  delattr(p, attname)

def prepare_for_shelving(p):
  if hasattr(p, "__unique_id__"):
    return

  p.__unique_id__ = get_unique_id(p)

  p.__getattr__ = getattr_from_shelf
  p.__setattr__ = setattr_to_shelf

def getattr_from_shelf(self,attname):
  return shelf[get_attribute_id(self, attname)]

def setattr_to_shelf(self,attname, value):
  if shelf.has_key(attname):
    return shelf[get_attribute_id(self, attname)] = value

def get_attribute_id(p, attname):
  return "%s.%s" % (p.__unique_id__, attname)

> I'm not sure where/how this fits in, but I suspect it'll be
> needed over time.

I think I just figured out its just a plugin which you can just insert
as much as needed during the processing to push arbitrary amounts of the
model object tree onto disk. Man, that's cool :-).

However, one wonders if this is really more efficient than having the OS
do the swapping out of memory onto disk. I'm guessing that the OS maybe
has a hard time figuring out what it can reclaim from python (ie what is
the good stuff to swap out).

> Gump2 had to do a lot of cleanup in order not to bloat
> it's Python runtime to a crawl. I feel Gump3 will push this far further, and
> some properties will heavy yet 'used' after the build has occurs. Dependent
> upon the run algorithm some properties can reside in memory long after they
> are useful, and it'd be nice to abstract that 'gc' from the plugins.

I dunno. GC is hard; that's why you use languages that do it for you.
Maybe some weakrefs are in order in some of the collections (ie the ones
on the workspace). But you can only start implementing that kind of
thing when you start seeing problems.

One thing Gump3 doesn't do is keep references to objects anywhere in the
plugins, the utility code, or the algorithm. That means its GC should be
a lot more predictable. Precisely by storing everything in the model and
consistently using a (stateless-)visitor pattern I think we will have to
worry a lot less about objects marked as "used" when they really aren't.

> [I also
> think we'll want to do property() calls to implement a load on demand of log
> files, into strings in memory, rather than spew into memory. We'll see.]
> Anyway, how might a Reaper fit in?

Just another plugin :-)

> BTW: We need to add a test case like this, so Gump3 doesn't crash like Gump2
> does. This (and usually a typo in packages) has been the primary cause for
> run fall-overs in Gump2 recent memory. [Question] Where would I start to fix
> this crash?
> 
>     <project name="bogus5">
>       <module name="doesnotexist"/>
>     </project>

Uh. My guess is that this error surfaces either in the normalizer or the
objectifier. Just add it, look at the stack trace, and add error
handling code as high up that trace as possible.

> That all said, I back (for now) to looking at why bootstrap-ant gives me
> this on Cygwin (despite the below) so I can attempt dist-ant using the
> JavaAntBuilder.
> 
> ... Bootstrapping Ant Distribution
> JAVA_HOME=c:\j2sdk1.4.2_08
> JAVA=c:\j2sdk1.4.2_08\bin\java
> JAVAC=c:\j2sdk1.4.2_08\bin\javac
> CLASSPATH=lib\optional\junit-3.8.1.jar;lib\xercesImpl.jar;lib\xml-apis.jar;b
> uild
> \classes;src\main;
> 
> ... Compiling Ant Classes
> The system cannot find the path specified.

well, the CLASSPATH obviously doesn't contain absolute paths, which is
something I would try and chage immediately. Additionallly, I imagine
there might be a need to translate some cygwin paths to windows paths.

I'd recommend adding debug statements to ant bootstrap script so you
know exactly what line causes the error. That might give a clue.

Did I mention I hate cygwin? :)

Pfew. Really need to get back to work...

- LSD

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Mime
View raw message