hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?
Date Mon, 20 Jun 2011 17:39:13 GMT
I think you could store deltas and roll them up later. You would have
to store them under a qualifier that's unique for each job so that
failures and speculative execution (if enabled) only overwrites
instead of incrementing something. At read time you would need to sum
up those columns together.

J-D

On Fri, Jun 17, 2011 at 4:12 PM, Leif Wickland <leifwickland@gmail.com> wrote:
> Interesting (and mildly terrifying) point, Ryan.
>
> Is there a valid pattern for storing a sum in HBase then using mapreduce to
> calculate an update to that sum based on incremental data updates?
>
> It seems a cycle like the following would avoid double increment problems,
> but would suffer from a monster race condition.
>
> 1. Mapreduce updated values into aggregates (written to HDFS).
> 2. Mapreduce aggregates with existing value in HBase into new target value
> for HBase (but written to HDFS).
> 3. Mapreduce writing new values to HBases.
>
> Please tell me there's a better way.
>
> Thanks,
>
> Leif
>
> On Fri, Jun 17, 2011 at 3:33 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
>> Watch out - increment is not idempotent, so you will have to somehow
>> ensure that a map runs exactly 1x and never more or less than that.
>> Also job failures will ruin the data as well.
>>
>> -ryan
>>
>> On Fri, Jun 17, 2011 at 1:57 PM, Stack <stack@duboce.net> wrote:
>> > Go for it!
>> > St.Ack
>> >
>> > On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland <leifwickland@gmail.com>
>> wrote:
>> >> I tried to use TableMapper and TableOutputFormat in
>> >> from org.apache.hadoop.hbase.mapreduce to write a map-reduce which
>> >> incremented some columns.  I noticed that TableOutputFormat.write()
>> doesn't
>> >> support Increment, only Put and Delete.
>> >>
>> >> Is there a reason that TableOutputFormat shouldn't support increment?
>> >>
>> >> I think adding support for increment would only require adding a copy
>> >> constructor to Increment and a few lines to TableOutputFormat:  I'd be
>> >> willing to give writing the patch a try if there's no objection.
>> >>
>> >> Leif Wickland
>> >>
>> >
>>
>

Mime
View raw message