Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 15070 invoked from network); 24 Jan 2011 01:41:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Jan 2011 01:41:20 -0000 Received: (qmail 1083 invoked by uid 500); 24 Jan 2011 01:41:19 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 1042 invoked by uid 500); 24 Jan 2011 01:41:18 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 1034 invoked by uid 99); 24 Jan 2011 01:41:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jan 2011 01:41:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ghendrey@decarta.com designates 208.81.204.160 as permitted sender) Received: from [208.81.204.160] (HELO mx3.decarta.com) (208.81.204.160) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jan 2011 01:41:12 +0000 Received: from dct-mail.sanjose.telcontar.com ([10.253.0.17]) by mx3.decarta.com with Microsoft SMTPSVC(6.0.3790.4675); Sun, 23 Jan 2011 17:40:51 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Scalability problem with HBase Date: Sun, 23 Jan 2011 17:40:49 -0800 Message-ID: <6C5C1804772DB944BA88A0DC48D338DA0A4AA0B6@dct-mail.sanjose.telcontar.com> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Scalability problem with HBase Thread-Index: Acu7Rmb1k+0N9YC7THuK9RKJNq0cEgAITetA References: From: "Geoff Hendrey" To: X-OriginalArrivalTime: 24 Jan 2011 01:40:51.0583 (UTC) FILETIME=[BBBB48F0:01CBBB67] just curious what you mean by "reverse search index".=20 -g -----Original Message----- From: Thibault Dory [mailto:dory.thibault@gmail.com]=20 Sent: Sunday, January 23, 2011 1:42 PM To: user@hbase.apache.org Subject: Scalability problem with HBase Hello, I'm currently testing the performances of HBase for a specific test case. I have downloaded ~20000 articles from Wikipedia and I want to test the performances of read/writes and MapReduce. I'm using HBase 0.20.6 and Hadoop 0.20.2 on a cluster of Ubuntu powered servers connected with Gigabit ethernet. My test works like this : - I start with 3 physical server, used like this : 3 hadoop nodes (1 namenode and 3 datanode) and for HBase : 1 master and 3 regionserver. - I insert all the articles, with one article by row that contains to cells : ID and article - I start 3 threads from another machine, reading and updating (I simply append the string "1" to the end of the article) articles randomly and I measure the time needed for all the operations to finish - I build a reverse search index using two phases of MapReduce and measure the time to compute it - then I add a new server on wich I start a datanode and a region server and I start the benchmark again with 4 thread - I repeat those steps until I reach the last available server (8 in total) I am keeping the total number of operations as a constant and appending "1" to an article does not change much it's size. The problem is the kind of results I'm seeing, I believed that the time needed to perform the read/writes operations would decrease as I add new servers to the cluster but I'm experiencing exactly the opposite. Moreover, the more request I make, the slower the cluster become, for a constant size. For example here are the results in seconds that I have on my cluster just after the first insertion with 3 nodes for 10000 operations ( 20% of wich are updating the articles) : Individual times : [30.338116567, 24.402751402, 25.650858953, 27.699796324, 26.589869283, 33.909433157, 52.538378122, 48.0114018, 47.149348721, 42.825791078] Then one minute after this runs ends, everything else staying the same : Individual times : [58.181552147, 48.931155328, 62.509309199, 57.198395723, 63.267397201, 54.160937835, 57.635454167, 64.780292628, 62.762390414, 61.381563914] And finaly five minutes after the last run ends, everything else staying the same : Individual times : [56.852388792, 58.011768345, 63.578745601, 68.008043323, 79.545419247, 87.183962628, 88.1989561, 94.532923849, 99.852569437, 102.355709259] It seems quite clear that the time needed to perform the same amount of operations is rising fast. When I add server to the cluster, the time needed to perform the operations keeps rising. Here are the results for 4 servers using the same methodology as above : Immediately after the new server is added Individual times : [86.224951713, 80.777746425, 84.814954717, 93.07842057, 83.348558502, 90.037499401, 106.799544002, 98.122952552, 97.057614119, 94.277285461] One minute after last test Individual times : [94.633454698, 101.250176482, 99.945406887, 101.754011832, 106.882328108, 97.808320021, 97.050036703, 95.844557847, 97.931572694, 92.258327247] Five minute after last test Individual times : [98.188162512, 96.332809905, 93.598184149, 93.552745204, 96.905860067, 102.149408296, 101.545412423, 105.377292242, 108.855117219, 110.429000567] The times needed to compute the inverse search index using MapReduce are rising too : 3 nodes Results : [106.604148815, 104.829340323, 101.986450167, 102.871575842, 102.177574017] 4 nodes Results : [120.451610507, 115.007344179, 115.075212636, 115.146883431, 114.216465299] 5 nodes Results : [139.563445944, 132.933993434, 134.117730658, 132.927127084, 132.041046308] I don't think that this behaviour is normal, I should see the time needed to complete the same amount of work decreasing as I add more servers in the cluster. Unless this is because my cluster is too small? I should say that all the servers in the cluster seem to use an equal amount of CPU while the test is running, so it looks like all of them are working and there is no server that is not storing data. What do you think? Where did I screw up to see that kind of results with HBase?