incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Boulon <jbou...@netflix.com>
Subject Re: Begin a discussion about Chukwa as a top level project
Date Fri, 09 Apr 2010 18:24:01 GMT
I would like to take advantage of this to see a long term roadmap for Chukwa
before anything else.

I personally start Chukwa, 2 years ago to explore and push the Hadoop limits
moving from a pure batch system to something in between online analytics and
hourly/daily analytics but still on top of Hadoop.
My personal goal is to have a robust data collection pipeline and a robust
processing pipeline on top of Hadoop eco-system.
I personally don¹t need any UI just a robust and efficient backend that can
get the job done.
I would like to be able to natively talk to any data store
(Hive/Zebra/Hbase/Voldemort/...) if it make sense from a user perspective
but I don¹t want chukwa to be yet another NOSQL project.
Also, the more I¹m using Chukwa in different place to solve different
problems, the more I think that Chukwa should be an SDK instead of trying to
be an end-to-end system.
People have different agenda, requirements, some needs to use Avro some
Thrift, some Pig or Hive, etc.
Having a one size fit all seems good but I can see some issues in trying to
have an end-to-end system for everyone. Just an example, I¹m running a
modified Chukwa¹s version in production since I need to load the Demux
output to Hive. This will make some people happy but some will not be able
to use Hive, so what should I do? Commit my changes that will brake the
current workflow and loose some of our users??
On the other end, some are just using the data collection pipeline and not
the Demux, pushing the data to another store directly from ChukwaCollector.
I can see some valuable components here to stream directly to Mysql, Hbase
or Voldemort. Or like others, you may want to optimize the data store for
online display only. All of them have valid points and cases but if we are
not an SDK but a product that you install then those choices will make some
people unhappy.

I¹m not saying that I don¹t want a product at the end, maybe it¹s just a
refactoring to split Chukwa tree in components and then have different
pipeline/assemblies.
I think that we should first clearly state why or why not we are working on
or just using Chukwa and I hope that based on this discussion the choice
will be easier.

- Eric, you said Yes to TLP, can you list the reasons of your vote?

/Jerome.


On 4/8/10 5:32 PM, "Bill Graham" <billgraham@gmail.com> wrote:

> Besides the one pro that Eric points out (having more latitude to control the
> direction of Chukwa), are there any other strong pros or cons that is
> influencing people votes?
> 
> There were some discussions on the Pig list regarding the issue, and I think a
> few of the points apply similarly to Chukwa. To paraphrase a few:
> 
> Reasons to become TLP:
> - Is there a plan to make the Chukwa agnostic to the underlying data
> storage/analysis engine (i.e. Hadoop)? If so, then TLP makes more sense.
> 
> Reasons to not become TLP:
> - Would the project be able to grow it's community base better as a Hadoop
> project through better site integration with Hadoop (i.e., seo and such)?
> Basically, would more people find out about Chukwa?
> 
> The Pig folks are leaning towards not becoming a TLP until the day comes when
> the technology is to become Hadoop agnostic.
> 
> Just throwing out their rational to get an feel for why Chukwa's would differ?
> Would Chukwa get more support and guidance from being a TLP? Or would Chukwa's
> resources (considerably fewer than Pigs) be overly taxed with administrative
> responsibilities as a TLP?
> 
> I don't know the answer to these questions, but just throwing them out as fuel
> for discussion. I personally could go either way, provided there's a good
> argument where the pros clearly outweigh the cons.
> 
> thanks,
> Bill
> 
> 
> 
> On Thu, Apr 8, 2010 at 11:00 AM, Paul Tremblett <ptremblett@swva.net> wrote:
>> ++TLP
>> 
>> On Apr 8, 2010, at 12:46 PM, Eric Yang wrote:
>> 
>>> > You have probably heard by now that there is a discussion going on in
>>> > the Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>> > Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>> > umbrella and become top level Apache projects (TLP). This discussion
>>> > has picked up recently since the Apache board has clearly communicated
>>> > to the Hadoop PMC that it is concerned that Hadoop is acting as an
>>> > umbrella project with many disjoint subprojects underneath it. They
>>> > are concerned that this gives Apache little insight into the health
>>> > and happenings of the subproject communities which in turn means
>>> > Apache cannot properly mentor those communities.
>>> >
>>> > The purpose of this email is to start a discussion within the
>>> > Chukwa community about this topic. Let me cover first what becoming
>>> > TLP would mean for Chukwa, and then I'll go into what options I
>>> > think we as a community have.
>>> >
>>> > Becoming a TLP would mean that Chukwa would itself have a PMC that
>>> > would report directly to the Apache board. Who would be on the PMC
>>> > would be something we as a community would need to decide. Common
>>> > options would be to say all active committers are on the PMC, or all
>>> > active committers who have been a committer for at least a year. We
>>> > would also need to elect a chair of the PMC. This lucky person would
>>> > have no additional power, but would have the additional responsibility
>>> > of writing quarterly reports on Chukwa's status for Apache board
>>> > meetings, as well as coordinating with Apache to get accounts for new
>>> > committers, etc. We currently submit these same reports, however they
>>> > are forwarded to the board through the Hadoop PMC Chair. For more
>>> > information see
>>> > http://www.apache.org/foundation/how-it-works.html#roles
>>> >
>>> > Becoming a TLP would not mean that we are ostracized from the Hadoop
>>> > community. We would continue to be invited to Hadoop Summits, HUGs,
>>> > etc.
>>> >
>>> > I see three ways that we as a community can respond to this:
>>> >
>>> > 1) Say yes, we want to be a TLP now.
>>> >
>>> > 2) Say yes, we want to be a TLP, but not yet. We feel we need more
>>> > time to mature. If we choose this option we need to be able to clearly
>>> > articulate how much time we need and what we hope to see change in
>>> > that time.
>>> >
>>> > 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>> > the drawbacks of being a disjoint subproject. If we choose this, we
>>> > need to be able to say exactly what those benefits are and why we feel
>>> > they will be compromised by leaving the Hadoop project.
>>> >
>>> > There may other options that I haven't thought of. Please feel free to
>>> > suggest any you think of.
>>> >
>>> > Here are the thoughts I've formed so far on the subject:
>>> >
>>> > Benefits of moving to TLP:
>>> >
>>> > a) Here's the boards view as communicated to me:
>>> >
>>> > "we're looking to ensure that proper and effective oversight is
>>> >  reached, and umbrellas can get in the way of that. If you *also* think
>>> >  that all of your communities have proper oversight, and that you're
>>> >  communicating enough about each/all of them to the Board, so that *it*
>>> >  can provide oversight, then that's just fine. Go do the review and
>>> >  come back and say, "we're all good. no changes are necessary.""
>>> >
>>> > b) setting our own course - we would have our own PMC and therefore
>>> > have more latitude (within the apache rules of course) in setting
>>> > direction. PMC members would be focused on Chukwa exclusively.
>>> >
>>> > Let the discussion begin.
>>> >
>>> > 1) I vote for TLP, and I recommend Ari to become the PMC.
>>> >
>>> > Regards,
>>> > Eric
>>> >
>>> >
>> 
> 
> 


Mime
View raw message