hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Increment Counters in HBase during MapReduce
Date Sun, 24 Jun 2012 23:19:11 GMT
There are a couple of issues and I'm sure others will point them out. 

If you turn off speculative execution on the job, you don't get duplicate tasks running in
You could create a table to store your aggregations on a per job basis where your row-id could
incorporate your job-id. 
Then at the end of the job. If you didn't have any task failures or speculative execution
jobs, you could count on your aggregations to be correct. 
If you had a task fail or killed (a simple test if for some reason a job ran with speculative
execution) you could discard that row's data. 

On Jun 24, 2012, at 4:15 PM, David Koch wrote:

> Hello J-D
> I have a similar requirement as that presented by the original poster, i.e
> updating a totals count without having to push the entire data set through
> the Mapper again.
> Are you advising against calling incrementColumnValue on a mapper's HTable
> instance because the operation is not idempotent or are there other
> reasons? It is even suggested in the docs:
> http://hbase.apache.org/book/mapreduce.example.html (section 7.2.6).
> Do you know of any "count-exactly-once" implementations on top of Hadoop
> Map/Reduce?
> Thanks,
> /David
> On Tue, Jun 19, 2012 at 6:55 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>> This question was answered here already:
>> http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/%3CAANLkTinnW2d7DMCyFu3ptv1Hu_i3XqK_1pDSgD5NT_Lk@mail.gmail.com%3E
>> Counters are not idempotent, this can be hard to manage.
>> J-D
>> On Mon, Jun 18, 2012 at 5:49 PM, Sid Kumar <sqlsid101@gmail.com> wrote:
>>> Hi everyone,
>>>   I have a use case in HBase that I was wondering if someone may have
>>> stumbled upon. I am maintaining an ad impressions table with columns that
>>> are counters for certain metrics. I started using the
>> incrementColumnValue
>>> method part of the HTable API to update these metrics and that works
>> great.
>>>   I was wondering if this function could be used from a MapReduce job.
>>> The TableOutputFormat supports only Delete and Put operations. Using the
>>> Incremental counters saves me from doing any aggregations in my Map
>> Reduce
>>> code. Ideally i would like to just call this function in my mapper and
>>> wouldn't even need a Reducer.
>>>   Has anyone run into this use case? I would also love to know if there
>>> are any better alternatives of solving this too. Any info would be great.
>>> Thanks
>>> Sid

View raw message