chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: [DISCUSSION] Making HBaseWriter default
Date Mon, 22 Nov 2010 21:19:32 GMT
We are going to continue to have use cases where we want log data
rolled up into 5 minute, hourly and daily increments in HDFS to run
map reduce jobs on them. How will this model work with the HBase
approach? What process will aggregate the HBase data into time
increments like the current demux and hourly/daily rolling processes
do? Basically, what does the time partitioning look like in the HBase
storage scheme?

> My concern is that the demux process is going to become two parallel
> tracks, one works in mapreduce, and another one works in collector.  It
> becomes difficult to have clean efficient parsers which works in both

This statement makes me concerned that you're implying the need to
deprecate the current demux model, which is very different than making
one or the other the default in the configs. Is that the case?



On Mon, Nov 22, 2010 at 11:41 AM, Eric Yang <eyang@yahoo-inc.com> wrote:
> MySQL support has been removed from Chukwa 0.5.  My concern is that the demux process
is going to become two parallel tracks, one works in mapreduce, and another one works in collector.
 It becomes difficult to have clean efficient parsers which works in both places.  From
architecture perspective, incremental updates to data is better than batch processing for
near real time monitoring purpose.  I like to ensure Chukwa framework can deliver Chukwa's
mission statement, hence I standby Hbase as default.  I was playing with Hbase 0.20.6+Pig
0.8 branch last weekend, I was very impressed by both speed and performance of this combination.
 I encourage people to try it out.
>
> Regards,
> Eric
>
> On 11/22/10 10:50 AM, "Ariel Rabkin" <asrabkin@gmail.com> wrote:
>
> I agree with Bill and Deshpande that we ought to make clear to users
> that they don't nee HICC, and therefore don't need either MySQL or
> HBase.
>
> But I think what Eric meant to ask was which of MySQL and HBase ought
> to be the default *for HICC*.  My sense is that the HBase support
> isn't quite mature enough, but it's getting there.
>
> I think HBase is ultimately the way to go. I think we might benefit as
> a community by doing a 0.5 release first, while waiting for the
> pig-based aggregation support that's blocking HBase.
>
> --Ari
>
> On Mon, Nov 22, 2010 at 10:47 AM, Deshpande, Deepak
> <ddeshpande@verisign.com> wrote:
>> I agree. Making HBase by default would make some Chukwa users life difficult. In
my set up, I don't need HDFS. I am using Chukwa merely as a Log Streaming framework. I have
plugged in my own writer to write log files in Local File system (instead of HDFS). I evaluated
Chukwa with other frameworks and Chukwa had very good fault tolerance built in than other
frameworks. This made me recommend Chukwa over other frameworks.
>>
>> By making HBase default option would definitely make my life difficult :).
>>
>> Thanks,
>> Deepak Deshpande
>>
>
>
> --
> Ari Rabkin asrabkin@gmail.com
> UC Berkeley Computer Science Department
>
>

Mime
View raw message