gump-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam R. B. Jack" <aj...@apache.org>
Subject Re: The Gump3 branch
Date Sun, 09 Jan 2005 17:28:35 GMT
> Ooh, long e-mail! I'm gonna try and split this up... :-D

Sorry Dude, I got excited. :-) I'll try to keep them shorter or split them.
[I'll reply a few times to this one.]

Having slept on what I saw, I do have some serious questions, and (to keep
it short, I'll come right to the point, knowing you know I mean it
respectfully) I wonder if there is as much significant difference between
Gump2 and Gump3 as I first thought.  They are much the same.

I have a slight deja-vu feeling here. You've built a nice (clean) start,
like Sam did, but to get from this to a live running system will take much
the same work that I added last time, and I'm not sure the key problems of
Gump2 have been understood/corrected. I'm going to try (over time) to list
every place in Gump2 that I feel would be as bad in Gump3 so we can address
them. This isn't me being petty, but me trying to pressure test this new
approach against my understanding of reality (for all it's/my warts).

> I firmly believe there is very little need for different components to
> communicate. If you architect things the IOC way, components will use just
> one or two other components, and their parent can just set up the
references
> between all those components.

[ BTW: I still could use help with IOC. I have a crude understanding of it,
but please don't forget to enlighten me if you see I'm missing a point.]

Sure, I see that components ought not need to communicate directly. In Gump2
we have a model tree (workspace/modules/projects) and a (theoretically
separate, but not) tree of results. That tree is for a few projects, or all,
based off the filter of work to do. As components do work on that tree they
store data at the right level (run/workspace/module/project), perhaps even
setting state (failed, etc.). This is Gump2, and (as I hear it) Gump3, no
differences.

I feel it is that tree that is the weakness people consider "bloat". Not
it's memory size, but it's complexity, all the data stored in there -- and
the fact it is a "batch". That is a key similarity between Gump2/Gump3 and
(IMHO) a key issue to address. The closer I look the more I realize the
similarities between Gump2 and Gump3.

> What will happen is that a component needs a certain kind of result
> available. For example, something that pushes information in the dynagump
> database needs that information, which might be put there by an ant
builder
> or something like that. This kind of stuff is trivial in python; you just
> set the property on the relevant part of the model and then retrieve it
> later.
 [...]
> Note that such communication is pretty indirect. For example the start of
> the CvsUpdater plugin I did just pushes information into the model (the
log
> of the cvs command, exit status, etc) without worrying who uses that
> information (at the moment, it is just ignored).

Part of the problem is ordering/sequencing. The CVS updating would not  halt
all efforts on a module (builds would occur) 'cos the CVS failed if it had a
"semi-fresh" copy. (This was due to SF.net CVS being so flakey for so long
even for Gump-wise stable things like JUnit.) As such, prior to CVS updating
we needed to bring some "stats/history" information into memory, so enforces
an implicit dependency. [Note: Stats Actor today stores Stats on the Tree,
so users (CVS Actor) just ask for it from there, they don't talk directly.]

I know you can do "inter component communications" w/ Python properties,
Gump2 does, but it has no "contract" (as Stefano would say) it is not clean,
it is intricate internals knowledge from one component to annother. It is
stuff like this (and order dependencies like this) that ties components
together, and keeps things fat. [Gump2 at least used typed member
data/methods on the tree in order to allow some contracts.]

What you are suggesting in almost exactly how Gump2 works, and is (I fear)
where the thoughts to "bloat" come from.

> > There
> > were times when building logic wanted to know something historically
(had
> > this built before, etc.) in order to determine how much effort (or what
> > switches) to use. Is inter-component communications like this a real
no-no,
> > or is this something that might be "coincidentally" allowed via steps in
> > pre-processing, etc.
>
> We don't need "steps". Think unix command line utilities. You can make
them
> communicate:
>
>   find . -type f | xargs -v ".svn"

I'm a PIPE lover the much as the next guy, but simple flat stream pipes are
not what we are building. Our components use complex results. Do we need
contracts for those, or things (like DOM tree/XML structures) that we can
persist/stream/validate. [How does Cocoon address this?]

> Without steps. That "|" there in gump is achieved by setting a property on
a
> piece of the model.

As with Gump2, but the properties grow and need management. They (and
implicit dependencies) are the bloat.

> Plugins
> ------- 
> >  I think that generating plug-ins (perhaps even for loading, and such)
is
> > key. I'm not sure (yet) if the new model is any better than the old in
> > allowing the "core steps" (loading, modelling) to be pluged-in, but I
think
> > it need to be investigated.
>
> Yes, its easy. Change the get_verifier() in config.py to provide a
different
> implementation, and that's it!
>
> > I see you have a Maven parser, but could/should
> > that be a plug-in?
>
> I doubt we should be talking about this kind of stuff as a "plugin".
There's
> very specific bits of functionality that *need* to be performed (right
> "contracts") for gump to work. To me, a plugin is something you can leave
> out and still have something that basically works.

I think *the* key problem with Gump2 is "what is core" and "what can be
plugged in". Maybe I (and you) are getting a little carried away with what
can be a plug-in, and maybe too many things are invalidly coded as such. Is
"historical information" a fundamental service or some swappable component?
[Please forgive me if I fail to know the correct terminology for 'corn
concerns' or whatever. Perhaps teach me what I need to communicate more
clearly with you.]

The problem with Gump2 (and why it is a batch, and less able to be
incremental/split) is that we have metadata loading as a stage, and not "on
demand". As such we blast (we hope) through the whole metadata, building a
tree, and then work it as a batch. It is hard to allow folks to plug in
loaders (e.g. Maven parsers), and harder still to allow them to build/load
the in-memory structures themselves. This is true for "loading" for
"modelling", and for much of our core. This is where we fail to have a
system that we or others can break into pieces, uses in pieces. I think this
is where we need components.

I don't know if all things can be simple components, or if we need some
"interfaces" (e.g. a LoaderComponent, a BuilderCompent. etc.) In Gump2 I
tried the latter (if not formally as components) 'cos I felt it was less
pure, more practical, and better fitting the need. I'd like to hear
viewpoints on that, 'cos I think it is key.

> > Thanks Leo. Good job. [and now my mind is racing w/ thoughts around
this,
> > thanks for waking me up! I hope I don't cut a finger off w/ the jaws
'cos
> > I'm distracted. ;-)]
>
> Hehehe. Do let us know you're alright dude!

Fingers all still here, and still as "fat" as always. ;-) Burn building next
week, and once I (w/ a too enthusiastic career instructor) melted my helmet
in one of those.  I'll try to bring my brain back, from next week, w/o too
much new frying. :-)

regards,

Adam


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Mime
View raw message