Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6FAA110AC1 for ; Fri, 4 Oct 2013 09:46:56 +0000 (UTC) Received: (qmail 63288 invoked by uid 500); 4 Oct 2013 09:45:52 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 63241 invoked by uid 500); 4 Oct 2013 09:45:51 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 63213 invoked by uid 99); 4 Oct 2013 09:45:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Oct 2013 09:45:49 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bharathv@cloudera.com designates 209.85.212.179 as permitted sender) Received: from [209.85.212.179] (HELO mail-wi0-f179.google.com) (209.85.212.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Oct 2013 09:45:45 +0000 Received: by mail-wi0-f179.google.com with SMTP id hm2so1338414wib.6 for ; Fri, 04 Oct 2013 02:45:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=2W/xSFRoYzXIlcyPmd4VrIrXTOd47ZbH18N+UswIXwg=; b=T7G/jV1vgioZXb7EKluzDU+ENqSdGuUpizgkhWFzTff0NcwFRScthxYbpSqz4oVk1h Pn8nTAywS7OLFjYUWZnmB3F7saD9fnY+cxmCzgsZzJC1LkwCtzQvdPYxO6O2HRPWThPM 4yr0vA/vVqgL38cAtTiaw0nAEkIQlEZz+YXAhFLhVSuwvPM++7D0YFtCvfUbI/FUYfbx XwLS43j3t77ib3+tJxC5Aif4pJUA/1ZMJNc3e16gSfwapg+aCmnS6C6CrvrcyXy8KjzR TWJPNi8aNdNXQ3OAiHmUPtI2mAnHcw8kwRGhSUeYLavzT8CKAuuZPyX9rNlO/jg3x4hh y3/g== X-Gm-Message-State: ALoCoQlrA9W7T+6CWkKYsMtuNsXBDa1KnPd4TOzW5JPAjvZdQRfctronmUsKb6mvIkydW+mRdBz6 X-Received: by 10.180.37.164 with SMTP id z4mr6546831wij.30.1380879923872; Fri, 04 Oct 2013 02:45:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.144.9 with HTTP; Fri, 4 Oct 2013 02:45:03 -0700 (PDT) In-Reply-To: <524E8554.1050806@csd.auth.gr> References: <524E8554.1050806@csd.auth.gr> From: Bharath Vissapragada Date: Fri, 4 Oct 2013 15:15:03 +0530 Message-ID: Subject: Re: Newly added regionserver is not severing requests To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8f642bdc571abf04e7e729da X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f642bdc571abf04e7e729da Content-Type: text/plain; charset=ISO-8859-1 One possibility could be that the regions got balanced after the write load is complete. That means, when the regions were being written they were with one RS and once that is done, the region got assigned to the idle RS. Are you sure you are that YCSB writes to the regions after balancing too? Also you can run your benchmark now (after regions are balanced) and write some data to the regions on idle RS and see if it increases the request count. On Fri, Oct 4, 2013 at 2:37 PM, Thanasis Naskos wrote: > I'm setting up a Hbase cluster on a cloud infrastructure. > HBase version: 0.94.11 > Hadoop version: 1.0.4 > > Currently I have 4 nodes in my cluster (1 master, 3 regionservers) and I'm > using YCSB (yahoo benchmarks) to create a table (500.000 rows) and send > requests (Asynchronous requests). Everything works fine with this setup (as > I'm monitoring the hole process with ganglia and I'm getting lamda, > throughput, latency combined with the YCSB's output), but the problem > occurs when I add a new regionserver on-the-fly as it doesn't getting any > requests. > > What "on-the-fly" means: > While the YCSB is sending request to the cluster, I'm adding new > regionservers using python scripts. > > Addition Process (while the cluster is serving requests): > > 1. I'm creating a new VM which will act as the new regionserver and > configure every needed aspect (hbase, hadoop, /etc/host, connect to > private network, etc) > 2. Stoping **hbase** balancer > 3. Configuring every node in the cluster with the new node's information > * adding hostname to regioservers files > * adding hostname to hadoop's slave file > * adding hostname and IP to /etc/host file of every node > * etc > 4. Executing on the master node: > * `hadoop/bin/start-dfs.sh` > * `hadoop/bin/start-mapred.sh` > * `hbase/bin/start-hbase.sh` > (I've also tried to run `hbase start regionserver` on the newly > added node and does exactly the same with the last command - > starts the regionserver) > 5. Once the newly added node is up and running I'm executing **hadoop** > load balancer > 6. When the hadoop load balancer stops I'm starting again the **hbase** > load balancer > > I'm connecting over ssh to the master node and check that the load > balancers (hbase/hadoop) did their job as both the blocks and regions are > uniformly spread across all the regionservers/slaves including the new one. > But when I run status 'simple' in the hbase shell I see that the new > regionservers are not getting any requests. (below is the output of the > command after adding 2 new regionserver "okeanos-nodes-4/5") > > |hbase(main):008:0> status 'simple' > 5 live servers > okeanos-nodes-1:60020 1380865800330 > requestsPerSecond=5379, numberOfOnlineRegions=4, usedHeapMB=175, > maxHeapMB=3067 > okeanos-nodes-2:60020 1380865800738 > requestsPerSecond=5674, numberOfOnlineRegions=4, usedHeapMB=161, > maxHeapMB=3067 > okeanos-nodes-5:60020 1380867725605 > requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27, > maxHeapMB=3067 > okeanos-nodes-3:60020 1380865800162 > requestsPerSecond=3871, numberOfOnlineRegions=5, usedHeapMB=162, > maxHeapMB=3067 > okeanos-nodes-4:60020 1380866702216 > requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29, > maxHeapMB=3067 > 0 dead servers > Aggregate load: 14924, regions: 19| > > The fact that they don't serve any requests is also evidenced by the CPU > usage, in a serving regionserver is about 70% while in these 2 regioservers > is about 2%. > > Below is the output of|hadoop dfsadmin -report|, as you can see the block > are evenly distributed (according to|hadoop balancer -threshold 2|). > > |root@okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin > -report > Configured Capacity: 105701683200 (98.44 GB) > Present Capacity: 86440648704 (80.5 GB) > DFS Remaining: 84188446720 (78.41 GB) > DFS Used: 2252201984 (2.1 GB) > DFS Used%: 2.61% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > > ------------------------------**------------------- > Datanodes available: 5 (5 total, 0 dead) > > Name: 10.0.0.11:50010 > Decommission Status : Normal > Configured Capacity: 21140336640 (19.69 GB) > DFS Used: 309166080 (294.84 MB) > Non DFS Used: 3851579392 (3.59 GB) > DFS Remaining: 16979591168(15.81 GB) > DFS Used%: 1.46% > DFS Remaining%: 80.32% > Last contact: Fri Oct 04 11:30:31 EEST 2013 > > > Name: 10.0.0.3:50010 > Decommission Status : Normal > Configured Capacity: 21140336640 (19.69 GB) > DFS Used: 531652608 (507.02 MB) > Non DFS Used: 3852300288 (3.59 GB) > DFS Remaining: 16756383744(15.61 GB) > DFS Used%: 2.51% > DFS Remaining%: 79.26% > Last contact: Fri Oct 04 11:30:32 EEST 2013 > > > Name: 10.0.0.5:50010 > Decommission Status : Normal > Configured Capacity: 21140336640 (19.69 GB) > DFS Used: 502910976 (479.61 MB) > Non DFS Used: 3853029376 (3.59 GB) > DFS Remaining: 16784396288(15.63 GB) > DFS Used%: 2.38% > DFS Remaining%: 79.4% > Last contact: Fri Oct 04 11:30:32 EEST 2013 > > > Name: 10.0.0.4:50010 > Decommission Status : Normal > Configured Capacity: 21140336640 (19.69 GB) > DFS Used: 421974016 (402.43 MB) > Non DFS Used: 3852365824 (3.59 GB) > DFS Remaining: 16865996800(15.71 GB) > DFS Used%: 2% > DFS Remaining%: 79.78% > Last contact: Fri Oct 04 11:30:29 EEST 2013 > > > Name: 10.0.0.10:50010 > Decommission Status : Normal > Configured Capacity: 21140336640 (19.69 GB) > DFS Used: 486498304 (463.96 MB) > Non DFS Used: 3851759616 (3.59 GB) > DFS Remaining: 16802078720(15.65 GB) > DFS Used%: 2.3% > DFS Remaining%: 79.48% > Last contact: Fri Oct 04 11:30:29 EEST 2013| > > I've tried stopping YCSB, restarting hbase master and restarting YCSB but > with no lack.. these 2 nodes don't serve any requests! > > As there are many log and conf files, I have created a zip file with logs > and confs (both hbase and hadoop) of the master, a healthy regionserver > serving requests and a regionserver not serving requests.https://dl.** > dropboxusercontent.com/u/**13480502/hbase_hadoop_logs__**conf.zip > > Thank you in advance!! > > -- Bharath Vissapragada --e89a8f642bdc571abf04e7e729da--