bigtop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <n...@reactor8.com>
Subject RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
Date Thu, 18 Jun 2015 06:26:11 GMT
Building on conversations pre/during/post Apachecon and looking at the post 1.0 bigtop focus
and efforts, want to lay out a few things, get peoples comments.  Seems to be some consensus
that the project can look towards serving end application/data developers more going forward,
while continuing the tradition of the projects build/pkg/test/deploy roots.

I have spent the past couple months, and heavily the past 3 or so weeks, talking to many different
potential end users at meetups, conferences, etc.., also having some great conversations with
commercial open source vendors that are interested in what a "future bigtop" can be and what
it could provide to users.

I believe we need to put some focused effort into few foundational things to put the project
in a position to move faster and attract a wider range of users as well as new contributors.

-----------
CI "2.0"
-----------

Start of this is already underway based on the work roman started last year and continuing
effort with new setup and enhancement on bigtop AWS infrastructure, Evans has been pushing
this along into the 1.0 release.  Speed of getting new packages built and up to date needs
to increase so releases can happen at a regular clip.., even looking towards user friendly
"ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc components they want
and have a stack around that.

Related to this, hoping the group can come to some idea/agreement on some semver style versioning
for the project post 1.0.  I think this could set a path forward for releases that can happen
faster, while not holding up the whole train if a single "smaller" component has a couple
issues that cant/wont be resolved by the main stakeholders or interested parties in said component.
 An example might be new pig or sqoop having issues.., the 1.2 release would still go out
the door with 1.2.1 coming days/weeks later once new pig or sqoop was fixed up.

---------------------------------------------
Proper package repository hosting
---------------------------------------------

I put together a little test setup based on the 0.8 assets, we can probably build off of that
with 1.0, working towards the CI automatically posting nightly (or just-in-time) builds off
latest so people can play around.  Debs/rpms seem should be the focal pt of output for the
project assets, everything else is additive and builds off of that (ie: user who says "I am
not a puppet shop so don’t care about the modules.., but do my own automation and if you
point me to some sane repositories I can do the rest myself with couple decent getting started
steps")

-----------------------------------------------------------------
Greatly increasing the UX and getting started content
-----------------------------------------------------------------

This is the big one.., new website, focused docs and getting started examples for end users,
other specific content for contributors.  I will be starting to put some cycles into new website
jira probably starting next week, will try to scoot through it and start posting some working
examples for feedback once something basic is in place.  For those interested in helping out
on doc work and getting started content let me know.., looking at subjects like:

   -Developer getting started
         -using the packages
         -using puppet modules and deployment options
         -deploying reference example stacks
         -setting up your own big data CI
         -etc

   -Contributing to Bigtop:
         -how to submit your first patch/pull-request
         -adding new component (step by step, canned learning component example, etc)
         -adding tests to an existing component (steps, canned hello world example test, etc)
         -writing your own test data generator
         -etc

Those are some thoughts and couple initial focal areas that are driving me around bigtop participation



-----Original Message-----
From: Andrew Purtell [mailto:apurtell@apache.org] 
Sent: Tuesday, June 16, 2015 12:02 PM
To: dev@bigtop.apache.org
Cc: user@bigtop.apache.org
Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane
? Containers?

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a misinterpretation
somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams,
batch,sql all in one" -> This statement deprecates the utility of the labors of rest of
the Hadoop ecosystem in favor of Gridgain and Spark. As a gross generalization it's unlikely
to be a helpful statement in any case.

It's fine if we all have our favorites, of course. I think we're set up well to empirically
determine winners and losers, we don't need to make partisan statements. Those components
that get some user interest in the form of contributions that keep them building and happy
in Bigtop will stay in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.


On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <jayunit100.apache@gmail.com>
wrote:

> thanks andy - i agree with most of your opinions around continuing to 
> build standard packages.. but can you clarify what was offensive ?  
> must be a misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on 
> our toes), is that some folks may be interested in hacking around, in 
> a separate branch - on some bleeding edge bigdata deployments - which 
> attempts to incorporate resource managers and  containers as 
> first-class citizens.
>
> Again this is all just ideas - not in any way meant to derail the 
> packaging efforts - but rather - just to gauge folks interest level in 
> the bleeding edge, docker, mesos, simplified  processing stacks, and so on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <apurtell@apache.org>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already 
> > > does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop 
> > project, some day, then it will turn off people. I can see where you 
> > are coming from, I think. Correct me if I'm wrong: We have limited 
> > bandwidth, we should move away from Roman et. al.'s vision of Bigtop 
> > as an inclusive distribution of big data packages, and instead 
> > become highly opinionated and tightly focused. If that's accurate, I 
> > can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may 
> > have
> to
> > look at in terms of inclusion - both software and user communities. 
> > For example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at 
> > the Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas 
> > <jayunit100.apache@gmail.com>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a 
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides 
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole 
> >> hadoop ecosystem, and there is a huge shift to in-memory, 
> >> monolithic stacks happening (i.e. gridgain or spark can do what 90% 
> >> of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro 
> >> which (1) just installed an HCFS implementation (gluster,HDFS,...) 
> >> along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark + 
> >> ignite
> ]]
> >> --- and then (3) do the integration testing of available 
> >> mesos-framework plugins for ignite and spark underneath.  If other 
> >> folks are interested, maybe we could create the "1x" or "in-memory" 
> >> branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet 
> > Hein (via Tom White)
> >
>
>
>
> --
> jay vyas
>



--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


Mime
View raw message