bigtop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olaf Flebbe ...@oflebbe.de>
Subject Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
Date Thu, 18 Jun 2015 21:32:06 GMT
Thanks Nate

for this focused writeup!

Yeah maybe it is time to reboot our brains ...

Additionaly to the points of nate I would like to attack this in bigtop 1.1.0:

…………..
Building from source or downloading ?
……………

However we have a substancial problem hidden deep in th CI „2.0“ approach using containers

You may know that we place artifacts (i.e. jars) we built with bigtop into the local maven
cache ~/.m2. (look for mvn install in do-component-build). The idea is that later maven builds
will pick these artifacts and use them rather downloading them from maven central.

Placing artifacts into ~/.m2 will not have any effect if we use CI containers the way we do
now: The maven cache ~/.m2 is lost when the container ends.

[This triggered misfeature in JIRA BIGTOP-1893, BTW:  gradle rpm/apt behaved differently from
a container build with artifacts from maven central.]

Option 1)  Remove mvn install from all do-component-builds

Results:

+ We compile projects the way the upstream-developer does.
- local fixes and configurations will not propagated

Questions:
If we do not try to reuse our build-artifacts within compile we have to ask ourself "why do
we compile projects at all?“.

We can build a great test wether someone else has touched / manipulated the maven central
cache if we compare artifacts, but is this the really the point of compiling ourselves?


Option 2) Use mvn install and reuse artifacts even in containers.

Consequences:

- Containers are not stateless any more

- We have to add depencies to CI jobs so they run in order

- single components may break the whole compile process.

- Compile does not scale any more

My Opinion:
The way we do now "mvn install“ ,  simply tainting the maven cache seems not a really controlled
way to propagate artifacts to me.

Option 3) Use 1) but reuse artifacts in packages by placing symlinks and dependencies between
them.

- Packages will break with subtile problems if we do symlink artifacts  from different releases.

----
Neither Option 1, Option 2 nor Option 3 seems a clever way to fix the problem. Would like
to hear comments regarding this issue:


In my humble opinion we should follow Option 2 with all the grave consequences. But maybe
reworking mvn install by placing the artifacts with a bigtop specific name / groupid into
the maven cache and upload them to maven central .

Olaf










> Am 18.06.2015 um 08:26 schrieb nate@reactor8.com:
> 
> Building on conversations pre/during/post Apachecon and looking at the post 1.0 bigtop
focus and efforts, want to lay out a few things, get peoples comments.  Seems to be some consensus
that the project can look towards serving end application/data developers more going forward,
while continuing the tradition of the projects build/pkg/test/deploy roots.
> 
> I have spent the past couple months, and heavily the past 3 or so weeks, talking to many
different potential end users at meetups, conferences, etc.., also having some great conversations
with commercial open source vendors that are interested in what a "future bigtop" can be and
what it could provide to users.
> 
> I believe we need to put some focused effort into few foundational things to put the
project in a position to move faster and attract a wider range of users as well as new contributors.
> 
> -----------
> CI "2.0"
> -----------
> 
> Start of this is already underway based on the work roman started last year and continuing
effort with new setup and enhancement on bigtop AWS infrastructure, Evans has been pushing
this along into the 1.0 release.  Speed of getting new packages built and up to date needs
to increase so releases can happen at a regular clip.., even looking towards user friendly
"ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc components they want
and have a stack around that.
> 
> Related to this, hoping the group can come to some idea/agreement on some semver style
versioning for the project post 1.0.  I think this could set a path forward for releases that
can happen faster, while not holding up the whole train if a single "smaller" component has
a couple issues that cant/wont be resolved by the main stakeholders or interested parties
in said component.  An example might be new pig or sqoop having issues.., the 1.2 release
would still go out the door with 1.2.1 coming days/weeks later once new pig or sqoop was fixed
up.
> 
> ---------------------------------------------
> Proper package repository hosting
> ---------------------------------------------
> 
> I put together a little test setup based on the 0.8 assets, we can probably build off
of that with 1.0, working towards the CI automatically posting nightly (or just-in-time) builds
off latest so people can play around.  Debs/rpms seem should be the focal pt of output for
the project assets, everything else is additive and builds off of that (ie: user who says
"I am not a puppet shop so don’t care about the modules.., but do my own automation and
if you point me to some sane repositories I can do the rest myself with couple decent getting
started steps")
> 
> -----------------------------------------------------------------
> Greatly increasing the UX and getting started content
> -----------------------------------------------------------------
> 
> This is the big one.., new website, focused docs and getting started examples for end
users, other specific content for contributors.  I will be starting to put some cycles into
new website jira probably starting next week, will try to scoot through it and start posting
some working examples for feedback once something basic is in place.  For those interested
in helping out on doc work and getting started content let me know.., looking at subjects
like:
> 
>   -Developer getting started
>         -using the packages
>         -using puppet modules and deployment options
>         -deploying reference example stacks
>         -setting up your own big data CI
>         -etc
> 
>   -Contributing to Bigtop:
>         -how to submit your first patch/pull-request
>         -adding new component (step by step, canned learning component example, etc)
>         -adding tests to an existing component (steps, canned hello world example test,
etc)
>         -writing your own test data generator
>         -etc
> 
> Those are some thoughts and couple initial focal areas that are driving me around bigtop
participation
> 
> 
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Tuesday, June 16, 2015 12:02 PM
> To: dev@bigtop.apache.org
> Cc: user@bigtop.apache.org
> Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane
? Containers?
> 
>> thanks andy - i agree with most of your opinions around continuing to
> build
> standard packages.. but can you clarify what was offensive ?  must be a misinterpretation
somewhere.
> 
> Sure.
> 
> A bit offensive.
> 
> "gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams,
batch,sql all in one" -> This statement deprecates the utility of the labors of rest of
the Hadoop ecosystem in favor of Gridgain and Spark. As a gross generalization it's unlikely
to be a helpful statement in any case.
> 
> It's fine if we all have our favorites, of course. I think we're set up well to empirically
determine winners and losers, we don't need to make partisan statements. Those components
that get some user interest in the form of contributions that keep them building and happy
in Bigtop will stay in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.
> 
> 
> On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <jayunit100.apache@gmail.com>
> wrote:
> 
>> thanks andy - i agree with most of your opinions around continuing to
>> build standard packages.. but can you clarify what was offensive ?
>> must be a misinterpretation somewhere.
>> 
>> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
>> we have now.   Thats the core product and will be for  the forseeable
>> future, absolutely !
>> 
>> 2) The idea (and its just an idea i want to throw out - to keep us on
>> our toes), is that some folks may be interested in hacking around, in
>> a separate branch - on some bleeding edge bigdata deployments - which
>> attempts to incorporate resource managers and  containers as
>> first-class citizens.
>> 
>> Again this is all just ideas - not in any way meant to derail the
>> packaging efforts - but rather - just to gauge folks interest level in
>> the bleeding edge, docker, mesos, simplified  processing stacks, and so on.
>> 
>> 
>> 
>> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <apurtell@apache.org>
>> wrote:
>> 
>>>> gridgain or spark can do what 90% of the hadoop ecosystem already
>>>> does,
>>> supporting streams, batch,sql all in one)
>>> 
>>> If something like this becomes the official position of the Bigtop
>>> project, some day, then it will turn off people. I can see where you
>>> are coming from, I think. Correct me if I'm wrong: We have limited
>>> bandwidth, we should move away from Roman et. al.'s vision of Bigtop
>>> as an inclusive distribution of big data packages, and instead
>>> become highly opinionated and tightly focused. If that's accurate, I
>>> can sum up my concern as
>>> follows: To the degree we become more opinionated, the less we may
>>> have
>> to
>>> look at in terms of inclusion - both software and user communities.
>>> For example, I find the above quoted statement a bit offensive as a
>> participant
>>> on not-Spark and not-Gridgain projects. I roll my eyes sometimes at
>>> the Docker over-hype. Is there still a place for me here?
>>> 
>>> 
>>> 
>>> On Mon, Jun 15, 2015 at 9:22 AM, jay vyas
>>> <jayunit100.apache@gmail.com>
>>> wrote:
>>> 
>>>> Hi folks.   Every few months, i try to reboot the conversation about the
>>>> next generation of bigtop.
>>>> 
>>>> There are 3 things which i think we should consider : A backplane
>> (rather
>>>> than deploy to machines, the meaning of the term "ecosystem" in a
>>>> post-spark in-memory apacolypse, and containerization.
>>>> 
>>>> 1) BACKPLANE: The new trend is to have a backplane that provides
>>>> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>> Is
>>>> it time for us to pick a resource manager?
>>>> 
>>>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole
>>>> hadoop ecosystem, and there is a huge shift to in-memory,
>>>> monolithic stacks happening (i.e. gridgain or spark can do what 90%
>>>> of the hadoop
>> ecosystem
>>>> already does, supporting streams, batch,sql all in one).
>>>> 
>>>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
>>>> Is it time to start experimenting with running docker tarballs ?
>>>> 
>>>> Combining 1+2+3 - i could see a useful bigdata upstream distro
>>>> which (1) just installed an HCFS implementation (gluster,HDFS,...)
>>>> along side,
>> say,
>>>> (2) mesos as a backplane for the tooling for [[ hbase + spark +
>>>> ignite
>> ]]
>>>> --- and then (3) do the integration testing of available
>>>> mesos-framework plugins for ignite and spark underneath.  If other
>>>> folks are interested, maybe we could create the "1x" or "in-memory"
>>>> branch to start hacking
>> on it
>>>> sometime ?    Maybe even bring the flink guys in as well, as they are
>>>> interested in bigtop packaging.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> jay vyas
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein (via Tom White)
>>> 
>> 
>> 
>> 
>> --
>> jay vyas
>> 
> 
> 
> 
> --
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
> 


Mime
View raw message