hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: One weird problem of my MR job upon hbase table.
Date Mon, 07 Jan 2013 16:59:29 GMT
Where did he mention he was attempting to bond the ports? 
Sorry if I missed it?

On Jan 7, 2013, at 7:37 AM, Doug Meil <doug.meil@explorysmedical.com> wrote:

> 
> Hi there, 
> 
> The HBase RefGuide has a comprehensive case study on such a case.  This
> might not be the exact problem, but the diagnostic approach should help.
> 
> http://hbase.apache.org/book.html#casestudies.slownode
> 
> 
> 
> 
> 
> On 1/4/13 10:37 PM, "Liu, Raymond" <raymond.liu@intel.com> wrote:
> 
>> Hi
>> 
>> I encounter a weird lag behind map task issue here :
>> 
>> I have a small hadoop/hbase cluster with 1 master node and 4 regionserver
>> node all have 16 CPU with map and reduce slot set to 24.
>> 
>> A few table is created with regions distributed on each region node
>> evenly ( say 16 region for each region server). Also each region has
>> almost the same number of kvs with very similar size. All table had
>> major_compact done to ensure data locality
>> 
>> I have a MR job which simply do local region scan in every map task ( so
>> 16 map task for each regionserver node).
>> 
>> By theory, every map task should finish within similar time.
>> 
>> But the real case is that some regions on the same region server always
>> lags behind a lot, say cost 150 ~250% of the other map tasks average
>> times.
>> 
>> If this is happen to a single region server for every table, I might
>> doubt it is a disk issue or other reason that bring down the performance
>> of this region server.
>> 
>> But the weird thing is that, though with each single table, almost all
>> the map task on the the same single regionserver is lag behind. But for
>> different table, this lag behind regionserver is different! And the
>> region and region size is distributed evenly which I double checked for a
>> lot of times. ( I even try to set replica to 4 to ensure every node have
>> a copy of local data)
>> 
>> Say table 1, all map task on regionserver node 2 is slow. While for table
>> 2, maybe all map task on regionserver node 3 is slow, and with table 1,
>> it will always be regionserver node 2 which is slow regardless of cluster
>> restart, and the slowest map task will always be the very same one. And
>> it won't go away even I do major compact again.....
>> 
>> So, anyone could give me some clue on what reason might possible lead to
>> this weird behavior? Any wild guess is welcome!
>> 
>> (BTW. I don't encounter this issue a few days ago with the same table.
>> While I do restart cluster and do a few changes upon config file during
>> that period, But restore the config file don't help)
>> 
>> 
>> Best Regards,
>> Raymond Liu
>> 
>> 
> 
> 
> 


Mime
View raw message