hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioakim Perros <imper...@gmail.com>
Subject Re: Reading in parallel from table's regions in MapReduce
Date Tue, 04 Sep 2012 15:43:21 GMT
Thank you very much for responding, but this was not exactly what I was 
looking for.

I have understood the splitting process when M/R jobs read from HBase 
tables (that each M/R task reads from exactly one region).

What I would like to clarify if possible is, if there is indeed some 
"locking" between map tasks concerning reading from different table's 
regions (because I noticed a sequential "reading behaviour" from the 
different map tasks),

and if so, how I could avoid it, in order to speed up the procedure and 
make map tasks read data in parallel (each from its respective region).

Thank you again very much, hoping there is an answer to that,
Ioakim

On 09/04/2012 06:32 PM, Doug Meil wrote:
> Hi there-
>
> Yes, there is an input split for each region of the source table of a MR
> job.
>
> There is a blurb on that in the RefGuide...
>
> http://hbase.apache.org/book.html#splitter
>
>
>
>
>
> On 9/4/12 11:17 AM, "Ioakim Perros" <imperros@gmail.com> wrote:
>
>> Hello,
>>
>> I would be grateful if someone could shed a light to the following:
>>
>> Each M/R map task is reading data from a separate region of a table.
>>  From the jobtracker 's GUI, at the map completion graph, I notice that
>> although data read from mappers are different, they read data
>> sequentially - like the table has a lock that permits only one mapper to
>> read data from every region at a time.
>>
>> Does this "lock" hypothesis make sense? Is there any way I could avoid
>> this useless delay?
>>
>> Thanks in advance and regards,
>> Ioakim
>>
>


Mime
View raw message