hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thanasis Naskos <anas...@csd.auth.gr>
Subject Re: Newly added regionserver is not severing requests
Date Fri, 04 Oct 2013 13:07:52 GMT
[SOLVED] I find what was going on and it had nothing to do with Hbase... 
I have forgotten to add the hostname and IP of the new RS to the YCSB 
server VM.... :-(

Thanks again Bharath for your interest

On 10/04/2013 02:18 PM, Thanasis Naskos wrote:
>> One possibility could be that the regions got balanced after the 
>> write load
>> is complete. That means, when the regions were being written they 
>> were with
>> one RS and once that is done, the region got assigned to the idle RS.
>
> I think that this is the case, but why is this wrong? I write the data 
> to the database with 3 RS's and when the write load is finished I add 
> one more RS and run hadoop and hbase load balancers to assign some 
> data and regions (respectively) to this new node (without adding new 
> data).... Shouldn't this work?
>
>> Are you sure you are that YCSB writes to the regions after balancing 
>> too?
>
> I should have mentioned that once the data is written to the RS's (3 
> RS's), YCSB sends only READ requests and doesn't write/insert/update 
> anything else to the database even after new nodes (RS's) are added.
>
>> Also you can run your benchmark now (after regions are balanced) and 
>> write
>> some data to the regions on idle RS and see if it increases the request
>> count.
>
> I've tried to add (put) a new row to the database from inside the idle 
> RS (shell) and the row was inserted properly (I've checked it with 
> "get" )... but as expected nothing changed still I have 2 RS's idle
>
> Thank you for your interest!!
>
> On 10/04/2013 12:45 PM, Bharath Vissapragada wrote:
>> One possibility could be that the regions got balanced after the 
>> write load
>> is complete. That means, when the regions were being written they 
>> were with
>> one RS and once that is done, the region got assigned to the idle RS.
>>
>> Are you sure you are that YCSB writes to the regions after balancing 
>> too?
>> Also you can run your benchmark now (after regions are balanced) and 
>> write
>> some data to the regions on idle RS and see if it increases the request
>> count.
>>
>>
>> On Fri, Oct 4, 2013 at 2:37 PM, Thanasis Naskos <anaskos@csd.auth.gr> 
>> wrote:
>>
>>> I'm setting up a Hbase cluster on a cloud infrastructure.
>>> HBase version: 0.94.11
>>> Hadoop version: 1.0.4
>>>
>>> Currently I have 4 nodes in my cluster (1 master, 3 regionservers) 
>>> and I'm
>>> using YCSB (yahoo benchmarks) to create a table (500.000 rows) and send
>>> requests (Asynchronous requests). Everything works fine with this 
>>> setup (as
>>> I'm monitoring the hole process with ganglia and I'm getting lamda,
>>> throughput, latency combined with the YCSB's output), but the problem
>>> occurs when I add a new regionserver on-the-fly as it doesn't 
>>> getting any
>>> requests.
>>>
>>> What "on-the-fly" means:
>>> While the YCSB is sending request to the cluster, I'm adding new
>>> regionservers using python scripts.
>>>
>>> Addition Process (while the cluster is serving requests):
>>>
>>> 1. I'm creating a new VM which will act as the new regionserver and
>>>     configure every needed aspect (hbase, hadoop, /etc/host, connect to
>>>     private network, etc)
>>> 2. Stoping **hbase** balancer
>>> 3. Configuring every node in the cluster with the new node's 
>>> information
>>>       * adding hostname to regioservers files
>>>       * adding hostname to hadoop's slave file
>>>       * adding hostname and IP to /etc/host file of every node
>>>       * etc
>>> 4. Executing on the master node:
>>>       * `hadoop/bin/start-dfs.sh`
>>>       * `hadoop/bin/start-mapred.sh`
>>>       * `hbase/bin/start-hbase.sh`
>>>         (I've also tried to run `hbase start regionserver` on the newly
>>>         added node and does exactly the same with the last command -
>>>         starts the regionserver)
>>> 5. Once the newly added node is up and running I'm executing **hadoop**
>>>     load balancer
>>> 6. When the hadoop load balancer stops I'm starting again the **hbase**
>>>     load balancer
>>>
>>> I'm connecting over ssh to the master node and check that the load
>>> balancers (hbase/hadoop) did their job as both the blocks and 
>>> regions are
>>> uniformly spread across all the regionservers/slaves including the 
>>> new one.
>>> But when I run status 'simple' in the hbase shell I see that the new
>>> regionservers are not getting any requests. (below is the output of the
>>> command after adding 2 new regionserver "okeanos-nodes-4/5")
>>>
>>> |hbase(main):008:0> status 'simple'
>>> 5 live servers
>>>      okeanos-nodes-1:60020 1380865800330
>>>          requestsPerSecond=5379, numberOfOnlineRegions=4, 
>>> usedHeapMB=175,
>>> maxHeapMB=3067
>>>      okeanos-nodes-2:60020 1380865800738
>>>          requestsPerSecond=5674, numberOfOnlineRegions=4, 
>>> usedHeapMB=161,
>>> maxHeapMB=3067
>>>      okeanos-nodes-5:60020 1380867725605
>>>          requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27,
>>> maxHeapMB=3067
>>>      okeanos-nodes-3:60020 1380865800162
>>>          requestsPerSecond=3871, numberOfOnlineRegions=5, 
>>> usedHeapMB=162,
>>> maxHeapMB=3067
>>>      okeanos-nodes-4:60020 1380866702216
>>>          requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29,
>>> maxHeapMB=3067
>>> 0 dead servers
>>> Aggregate load: 14924, regions: 19|
>>>
>>> The fact that they don't serve any requests is also evidenced by the 
>>> CPU
>>> usage, in a serving regionserver is about 70% while in these 2 
>>> regioservers
>>> is about 2%.
>>>
>>> Below is the output of|hadoop dfsadmin -report|, as you can see the 
>>> block
>>> are evenly distributed (according to|hadoop balancer -threshold 2|).
>>>
>>> |root@okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin
>>> -report
>>> Configured Capacity: 105701683200 (98.44 GB)
>>> Present Capacity: 86440648704 (80.5 GB)
>>> DFS Remaining: 84188446720 (78.41 GB)
>>> DFS Used: 2252201984 (2.1 GB)
>>> DFS Used%: 2.61%
>>> Under replicated blocks: 0
>>> Blocks with corrupt replicas: 0
>>> Missing blocks: 0
>>>
>>> ------------------------------**-------------------
>>> Datanodes available: 5 (5 total, 0 dead)
>>>
>>> Name: 10.0.0.11:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 309166080 (294.84 MB)
>>> Non DFS Used: 3851579392 (3.59 GB)
>>> DFS Remaining: 16979591168(15.81 GB)
>>> DFS Used%: 1.46%
>>> DFS Remaining%: 80.32%
>>> Last contact: Fri Oct 04 11:30:31 EEST 2013
>>>
>>>
>>> Name: 10.0.0.3:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 531652608 (507.02 MB)
>>> Non DFS Used: 3852300288 (3.59 GB)
>>> DFS Remaining: 16756383744(15.61 GB)
>>> DFS Used%: 2.51%
>>> DFS Remaining%: 79.26%
>>> Last contact: Fri Oct 04 11:30:32 EEST 2013
>>>
>>>
>>> Name: 10.0.0.5:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 502910976 (479.61 MB)
>>> Non DFS Used: 3853029376 (3.59 GB)
>>> DFS Remaining: 16784396288(15.63 GB)
>>> DFS Used%: 2.38%
>>> DFS Remaining%: 79.4%
>>> Last contact: Fri Oct 04 11:30:32 EEST 2013
>>>
>>>
>>> Name: 10.0.0.4:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 421974016 (402.43 MB)
>>> Non DFS Used: 3852365824 (3.59 GB)
>>> DFS Remaining: 16865996800(15.71 GB)
>>> DFS Used%: 2%
>>> DFS Remaining%: 79.78%
>>> Last contact: Fri Oct 04 11:30:29 EEST 2013
>>>
>>>
>>> Name: 10.0.0.10:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 486498304 (463.96 MB)
>>> Non DFS Used: 3851759616 (3.59 GB)
>>> DFS Remaining: 16802078720(15.65 GB)
>>> DFS Used%: 2.3%
>>> DFS Remaining%: 79.48%
>>> Last contact: Fri Oct 04 11:30:29 EEST 2013|
>>>
>>> I've tried stopping YCSB, restarting hbase master and restarting 
>>> YCSB but
>>> with no lack.. these 2 nodes don't serve any requests!
>>>
>>> As there are many log and conf files, I have created a zip file with 
>>> logs
>>> and confs (both hbase and hadoop) of the master, a healthy regionserver
>>> serving requests and a regionserver not serving requests.https://dl.**
>>> dropboxusercontent.com/u/**13480502/hbase_hadoop_logs__**conf.zip<https://dl.dropboxusercontent.com/u/13480502/hbase_hadoop_logs__conf.zip>

>>>
>>>
>>> Thank you in advance!!
>>>
>>>
>>
>


Mime
View raw message