hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioakim Perros <imper...@gmail.com>
Subject Re: Reading in parallel from table's regions in MapReduce
Date Tue, 04 Sep 2012 16:50:39 GMT
I understood that locking is at a row-level (and that my initial 
hypothesis is hopefully false) , but I was trying to clarify if there is 
some job configuration I am missing. Perhaps you 're right and I am 
misinterpreting the jobtracker's map completion graph.

Thanks for answering.

On 09/04/2012 07:41 PM, Michael Segel wrote:
> I think the issue is that you are misinterpreting what you are seeing and what Doug was
trying to tell you...
> The short simple answer is that you're getting one split per region. Each split is assigned
to a specific mapper task and that task will sequentially walk through the table finding the
rows that match your scan request.
> There is no lock or blocking.
> I think you really should actually read Lars George's book on HBase to get a better understanding.
> -Mike
> On Sep 4, 2012, at 11:29 AM, Ioakim Perros <imperros@gmail.com> wrote:
>> Thank you very much for your response and for the excellent reference.
>> The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil
>> I have just set the scan ' s caching to the number of rows I expect to retrieve at
each map task, and the scan's caching blocks feature to false (just as it is indicated at
MapReduce examples of HBase's homepage).
>> I am not aware of such a job configuration (requesting jobtracker to execute more
than 1 map tasks concurrently). Any other ideas?
>> Thank you again and regards,
>> ioakim
>> On 09/04/2012 06:59 PM, Jerry Lam wrote:
>>> Hi Loakim:
>>> Sorry, your hypothesis doesn't make sense. I would suggest you to read the
>>> "Learning HBase Internals" by Lars Hofhansl at
>>> http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final
>>> to
>>> understand how HBase locking works.
>>> Regarding to the issue you are facing, are you sure you configure the job
>>> properly (i.e. requesting the jobtracker to have more than 1 mapper to
>>> execute)? If you are testing on a single machine, you properly need to
>>> configure the number of tasktracker per node as well to see more than 1
>>> mapper to execute on a single machine.
>>> my $0.02
>>> Jerry
>>> On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros <imperros@gmail.com> wrote:
>>>> Hello,
>>>> I would be grateful if someone could shed a light to the following:
>>>> Each M/R map task is reading data from a separate region of a table.
>>>>  From the jobtracker 's GUI, at the map completion graph, I notice that
>>>> although data read from mappers are different, they read data sequentially
>>>> - like the table has a lock that permits only one mapper to read data from
>>>> every region at a time.
>>>> Does this "lock" hypothesis make sense? Is there any way I could avoid
>>>> this useless delay?
>>>> Thanks in advance and regards,
>>>> Ioakim

View raw message