hdt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Kerns <...@acm.org>
Subject Re: initial plugin split done
Date Sat, 16 Feb 2013 07:23:24 GMT
Hi; I really think we're going to need to build up some test
infrastructure, and now is when to start.

I think a key part of that should be mocking up the back end components.
While we'd like to also be able to run against actual backend services,
this won't always be feasible for a few reasons:

1) Availability of a test cluster. I don't know what infrastructure Apache
has, but even if it can support a test cluster, we'll need several for
different versions.

2) Performance -- the overhead of launching jobs to set up test conditions
is likely to add up pretty quickly. It worse if we do it for n different
test clusters -- and really bad if we also have to launch those cluster.

3) Reproducibility -- it will be difficult to reliably
and reproducibly drive a cluster to a desired state. This is especially
true for transient and fault states.

4) Defect injection -- it may not be possible to drive a cluster to a state
that will test certain error handling. Network failures would be another

Also, I think that pursuing the ability to inject mock back ends drives us
toward the sort of connector architecture we want. I think there are two
layers to this -- one focused on low-level communications connectivity --
the requests and responses to the various servers on the cluster. The
higher layer would focus on abstract functionality -- submit a job, list
running jobs, etc. The higher level is what we I think we want to call the
connectors; the lower level we could call a transport.

I actually think there's a small need for pluggable transports, beyond
testing. The normal network one, and one based on SSH proxying. I've seen
an environment where not all of the functionality was visible from outside
the cluster's network environment.

But the main role of the transport would be to facilitate testing

On Fri, Feb 15, 2013 at 7:54 PM, Adam Berry <adamb@apache.org> wrote:

> On Mon, Feb 11, 2013 at 10:49 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
> > Hey Adam,
> >
> > Great work! Do you think it's now time for a first release? Even if it's
> > not fully functional, and even if it doesn't support everything you
> > mention in paragraph #2 below, it will be a great incremental milestone.
> >
> > Thoughts?
> >
> > Cheers,
> > Chris
> >
> >
> > On 2/11/13 8:29 PM, "Adam Berry" <adamb@apache.org> wrote:
> >
> > >Hey guys,
> > >
> > >So first, let me say thanks for the patience as I worked on this.
> > >
> > >I've split up the original single plugin into a few logical units as we
> > >discussed before. I've thrown up the beginnings of a wiki page,
> > >http://wiki.apache.org/hdt/HDTGettingStarted with the beginnings of how
> > to
> > >grab this and work with it. The maven/tycho build support still needs to
> > >go
> > >in, but I should be able to get to that this week.
> > >
> > >So now we are ready to start attacking multi hadoop version support! We
> > >need multi version clients for launching jobs on hadoop clusters, also
> for
> > >interacting with hdfs on the same clusters, those will need the
> connectors
> > >that we discussed before. The other spot is in the jars that get that
> > >added
> > >to the classpath for MapReduce projects.
> > >
> > >Although the plugins are logically split, some of the classes in them
> need
> > >some more work to better split the work between core and ui, so keeping
> > >refactoring in mind as would be good I think. For now, the Hadoop
> imports
> > >are satisfied using the org.apache.hadoop.eclipse plugin, which bundles
> > >Hadoop 1.0.4.
> > >
> > >I've added some JIRAs as trackers for this feature work, so feel free to
> > >jump in to the source and chime in!
> > >
> > >Cheers,
> > >Adam
> >
> > Hey guys,
> sorry for the delay in responding, I was struck down by the flu.
> So I'm really not sure here, so comments and thoughts are more than
> welcome.
> We can probably make the current set of tools work with 1.0 without too
> much trouble, but we would also need some tests and documentation before
> release, not necessarily exhaustive, but at least something. Pursuing a
> release quickly would likely help to drive interest and momentum for the
> tools.
> I think I'm leaning in favor of doing that, gets us used to doing apache
> releases, and the other pieces of infrastructure, and would let us get
> visible within the hadoop community sooner.
> I believe its still important to start work on the multi version support as
> soon as possible, but I think we can do that in parallel to the release of
> some tools that support the 1.0 line.
> So let me know what you guys think, and we'll go from there.
> Cheers,
> Adam

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message