hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Coprocessor end point vs MapReduce?
Date Thu, 18 Oct 2012 01:19:32 GMT
Hi Mike,

I'm expecting to run the job weekly. I initially thought about using
end points because I found HBASE-6942 which was a good example for my

I'm fine with the Put part for the Map/Reduce, but I'm not sure about
the delete. That's why I look at coprocessors. Then I figure that I
also can do the Put on the coprocessor side.

On a M/R, can I delete the row I'm dealing with based on some criteria
like timestamp? If I do that, I will not do bulk deletes, but I will
delete the rows one by one, right? Which might be very slow.

If in the future I want to run the job daily, might that be an issue?

Or should I go with the initial idea of doing the Put with the M/R job
and the delete with HBASE-6942?



2012/10/17, Michael Segel <michael_segel@hotmail.com>:
> Hi,
> I'm a firm believer in KISS (Keep It Simple, Stupid)
> The Map/Reduce (map job only) is the simplest and least prone to failure.
> Not sure why you would want to do this using coprocessors.
> How often are you running this job? It sounds like its going to be
> sporadic.
> -Mike
> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> wrote:
>> Hi,
>> Can someone please help me to understand the pros and cons between
>> those 2 options for the following usecase?
>> I need to transfer all the rows between 2 timestamps to another table.
>> My first idea was to run a MapReduce to map the rows and store them on
>> another table, and then delete them using an end point coprocessor.
>> But the more I look into it, the more I think the MapReduce is not a
>> good idea and I should use a coprocessor instead.
>> BUT... The MapReduce framework guarantee me that it will run against
>> all the regions. I tried to stop a regionserver while the job was
>> running. The region moved, and the MapReduce restarted the job from
>> the new location. Will the coprocessor do the same thing?
>> Also, I found the webconsole for the MapReduce with the number of
>> jobs, the status, etc. Is there the same thing with the coprocessors?
>> Are all coprocessors running at the same time on all regions, which
>> mean we can have 100 of them running on a regionserver at a time? Or
>> are they running like the MapReduce jobs based on some configured
>> values?
>> Thanks,
>> JM

View raw message