bigtop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: What will the next generation of bigtop look like?
Date Tue, 09 Dec 2014 05:16:50 GMT
On Mon, Dec 08, 2014 at 11:57PM, Jay Vyas wrote:
> "Let's see if we can be smart and define the landscape"
> 
> Well put @cos...I think Romans point was that it would be hard, not that it
> would be bad. And I think you're both right : it's hard? Yes. But
> worthwhile... Possibly? Next step we will all have to get in a room and
> think about this face to face.
> 
> Let's shoot for a meetup after january in California... Where we can plan
> the future direction of bigtop.  In the meanwhile hope to hear more opinions
> on this.

+1 I can host at WANdisco or perhaps there other options?

Cos

> > On Dec 8, 2014, at 3:23 PM, Konstantin Boudnik <cos@apache.org> wrote:
> > 
> > First I want to address the RJ's question:
> > 
> > The most prominent downstream Bigtop Dependency would be any commercial
> > Hadoop distribution like HDP and CDH. The former is trying to
> > disguise their affiliation by pushing Ambari forward, and Cloudera's seemingly
> > shifting her focus to compressed tarballs media (aka parcels) which requires
> > a closed-source solutions like Cloudera Manager to deploy and control your
> > cluster, effectively rendering it useless if you ever decide to uninstall the
> > control software. In the interest of full disclosure, I don't think parcels
> > have any chance to landslide the consensus in the industry from Linux
> > packaging towards something so obscure and proprietary as parcels are.
> > 
> > 
> > And now to my actual points....:
> > 
> > I do strongly believe the Bigtop was and is the only completely transparent,
> > vendors' friendly, and 100% sticking to official ASF product releases way of
> > building your stack from ground up, deploying and controlling it anyway you
> > want to. I agree with Roman's presentation on how this project can move
> > forward. However, I somewhat disagree with his view on the perspectives. It
> > might be a hard road to drive the opinion of the community.  But, it is a high
> > road.
> > 
> > We are definitely small and mostly unsupported by commercial groups that are
> > using the framework. Being a box of LEGO won't win us anything. If anything,
> > the empirical evidences are against it as commercial distros have decided to
> > move towards their own means of "vendor lock-in" (yes, you hear me
> > right - that's exactly what I said: all so called open-source companies have
> > invented a way to lock-in their customers either with fancy "enterprise
> > features" that aren't adding but amending underlying stack; or with custom set
> > of patches oftentimes rendering the cluster to become incompatible between
> > different vendors).
> > 
> > By all means, my money are on the second way, yet slightly modified (as
> > use-cases are coming from users, not developers):
> >  #2 start driving adoption of software stacks for the particular kind of data workloads
> > 
> > This community has enough day-to-day practitioners on board to
> > accumulate a near-complete introspection of where the technology is moving.
> > And instead of wobbling in a backwash, let's see if we can be smart and define
> > this landscape. After all, Bigtop has adopted Spark well before any of the
> > commercials have officially accepted it. We seemingly are moving more and
> > more into in-memory realm of data processing: Apache Ignite (Gridgain),
> > Tachyon, Spark. I don't know how much legs Hive got in it, but I am doubtful,
> > that it can walk for much longer... May be it's just me.
> > 
> > In this thread http://is.gd/MV2BH9 we already discussed some of the aspects
> > influencing the feature of this project. And we are de-facto working on the
> > implementation. In my opinion, Hadoop has been more or less commoditized
> > already. And it isn't a bad thing, but it means that the innovations are
> > elsewhere. E.g. Spark moving is moving beyond its ties with storage layer via
> > Tachyon abstraction; GridGain simply doesn't care what's underlying storage
> > is. However, data needs to be stored somewhere before it can be processed. And
> > HCFS seems to be fitting the bill ok. But, as I said already, I see the real
> > action elsewhere. If I were to define the shape of our mid- to long'ish term
> > roadmap it'd be something like that:
> > 
> >            ^   Dashboard/Visualization  ^
> >            |     OLTP/ML processing     |
> >            |    Caching/Acceleration    |
> >            |         Storage            |
> > 
> > And around this we can add/improve on deployment (R8???),
> > virtualization/containers/clouds.  In other words - let's focus on the
> > vertical part of the stack, instead of simply supporting the status quo.
> > 
> > Does Cassandra fits the Storage layer in that model? I don't know and most
> > important - I don't care. If there's an interest and manpower to have
> > Cassandra-based stack - sure, but perhaps let's do as a separate branch or
> > something, so we aren't over-complicating things. As Roman said earlier, in
> > this case it'd be great to engage Cassandra/DataStax people into this project.
> > But something tells me they won't be eager to jump on board.
> > 
> > And finally, all this above leads to "how": how we can start reshaping the
> > stack into its next incarnation? Perhaps, Ubuntu model might be an answer for
> > that, but we have discussed that elsewhere and dropped the idea as it wasn't
> > feasible back in the day. Perhaps its time just came?
> > 
> > Apologies for a long post.
> >  Cos
> > 
> > 
> >> On Sun, Dec 07, 2014 at 07:04PM, RJ Nowling wrote:
> >> Which other projects depend on BigTop?  How will the questions about the
> >> direction of BigTop affect those projects?
> >> 
> >> On Sun, Dec 7, 2014 at 6:10 PM, Roman Shaposhnik <roman@shaposhnik.org>
> >> wrote:
> >> 
> >>> Hi!
> >>> 
> >>> On Sat, Dec 6, 2014 at 3:23 PM, jay vyas <jayunit100.apache@gmail.com>
> >>> wrote:
> >>>> hi bigtop !
> >>>> 
> >>>> I thought id start a thread a few vaguely related thoughts i have around
> >>>> next couple iterations of bigtop.
> >>> 
> >>> I think in general I see two major ways for something like
> >>> Bigtop to evolve:
> >>>   #1 remain a 'box of LEGO bricks' with very little opinion on
> >>>        how these pieces need to be integrated
> >>>   #2 start driving oppinioned use-cases for the particular kind of
> >>>        bigdata workloads
> >>> 
> >>> #1 is sort of what all of the Linux distros have been doing for
> >>> the majority of time they existed. #2 is close to what CentOS
> >>> is doing with SIGs.
> >>> 
> >>> Honestly, given the size of our community so far and a total
> >>> lack of corporate backing (with a small exception of Cloudera
> >>> still paying for our EC2 time) I think #1 is all we can do. I'd
> >>> love to be wrong, though.
> >>> 
> >>>> 1) Hive:  How will bigtop to evolve to support it, now that it is much
> >>> more
> >>>> than a mapreduce query wrapper?
> >>> 
> >>> I think Hive will remain a big part of Hadoop workloads for forseeable
> >>> future. What I'd love to see more of is rationalizing things like how
> >>> HCatalog, etc. need to be deployed.
> >>> 
> >>>> 2) I wonder wether we should confirm cassandra interoperability of spark
> >>> in
> >>>> bigtop distros,
> >>> 
> >>> Only if there's a significant interest from cassandra community and even
> >>> then my biggest fear is that with cassandra we're totally changing the
> >>> requirements for the underlying storage subsystem (nothing wrong with
> >>> that, its just that in Hadoop ecosystem everything assumes very HDFS'ish
> >>> requirements for the scale-out storage).
> >>> 
> >>>> 4) in general, i think bigtop can move in one of 3 directions.
> >>>> 
> >>>>  EXPAND ? : Expanding to include new components, with just basic
> >>> interop,
> >>>> and let folks evolve their own stacks on top of bigtop on their own.
> >>>> 
> >>>>  CONTRACT+FOCUS ?  Contracting to focus on a lean set of core
> >>> components,
> >>>> with super high quality.
> >>>> 
> >>>>  STAY THE COURSE ? Staying the same ~ a packaging platform for just
> >>>> hadoop's direct ecosystem.
> >>>> 
> >>>> I am intrigued by the idea of A and B both have clear benefits and
> >>> costs...
> >>>> would like to see the opinions of folks --- do we  lean in one direction
> >>> or
> >>>> another? What is the criteria for adding a new feature, package, stack
to
> >>>> bigtop?
> >>>> 
> >>>> ... Or maybe im just overthinking it and should be spending this time
> >>>> testing spark for 0.9 release....
> >>> 
> >>> I'd love to know what other think, but for 0.9 I'd rather stay the course.
> >>> 
> >>> Thanks,
> >>> Roman.
> >>> 
> >>> P.S. There are also market forces at play that may fundamentally change
> >>> the focus of what we're all working on in the year or so.
> >>> 

Mime
View raw message