Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67FB61780A for ; Wed, 1 Oct 2014 04:05:04 +0000 (UTC) Received: (qmail 25450 invoked by uid 500); 1 Oct 2014 04:05:02 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 25376 invoked by uid 500); 1 Oct 2014 04:05:02 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 25366 invoked by uid 99); 1 Oct 2014 04:05:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 04:05:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.213.42 as permitted sender) Received: from [209.85.213.42] (HELO mail-yh0-f42.google.com) (209.85.213.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 04:04:36 +0000 Received: by mail-yh0-f42.google.com with SMTP id t59so10412yho.29 for ; Tue, 30 Sep 2014 21:04:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=1LA3/MCM/aszBKaHdEBlNYhemC6rinDL9sEPu91JF8g=; b=WeufBIxAF6mHPrlAKWBsXGD9Jw0HxAZi04KcOMKOy4/VOyblWn/DRNkLBN8XjFvXHQ SZU9K0lUVzvkUnr7mR16E0csRtg1sXau/WwyJgU7CDpmN+kThj/S5ODp3/+ot0JmtGsi fpJSBxAx4daPjdgxvc+J0gQnvlgicDPj4sleW9eBnyZEG4CMbEoQUHTlbDdh5We08662 6sImUlupOJJ7f101s9BX8v927pwCmpiUNETulWtkFROBXOrEurSfn6rXgVe/RspopywV DCDQ/6ReWhX3yLTcwMWETo8RExXgW8gYOm2lYXzvBMzka5crO4wU6kFqvmhBhy0OXeS4 lAyw== MIME-Version: 1.0 X-Received: by 10.236.134.81 with SMTP id r57mr219117yhi.172.1412136274774; Tue, 30 Sep 2014 21:04:34 -0700 (PDT) Received: by 10.170.163.70 with HTTP; Tue, 30 Sep 2014 21:04:34 -0700 (PDT) In-Reply-To: References: Date: Tue, 30 Sep 2014 21:04:34 -0700 Message-ID: Subject: Re: Reading from HBase is too slow From: Ted Yu To: Tao Xiao Cc: Vladimir Rodionov , "user@spark.incubator.apache.org" Content-Type: multipart/alternative; boundary=20cf303dd42208528e050454992c X-Virus-Checked: Checked by ClamAV on apache.org --20cf303dd42208528e050454992c Content-Type: text/plain; charset=UTF-8 Can you launch a job which exercises TableInputFormat on the same table without using Spark ? This would show whether the slowdown is in HBase code or somewhere else. Cheers On Mon, Sep 29, 2014 at 11:40 PM, Tao Xiao wrote: > I checked HBase UI. Well, this table is not completely evenly spread > across the nodes, but I think to some extent it can be seen as nearly > evenly spread - at least there is not a single node which has too many > regions. Here is a screenshot of HBase UI > . > > Besides, I checked the size of each region in bytes for this table in the > HBase shell as follows: > > > -bash-4.1$ hadoop dfs -du -h /hbase/data/default/C_CONS > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > > 288 /hbase/data/default/C_CONS/.tabledesc > 0 /hbase/data/default/C_CONS/.tmp > 159.6 M /hbase/data/default/C_CONS/0008c2494a5399d68495d9c8ae147821 > 76.7 M /hbase/data/default/C_CONS/021d7d21d7faeb7b2a77835d6f86747e > 81.3 M /hbase/data/default/C_CONS/02a39a316ac6d2bda89e72e74aa18a6e > 155.3 M /hbase/data/default/C_CONS/02fe51bc077290febc85651d8ee31abc > 173.4 M /hbase/data/default/C_CONS/045859bcc70e36eb4d33f8ca3b7d9633 > 82.6 M /hbase/data/default/C_CONS/05c868b6036cc4f1836f70be6215c851 > 74.1 M /hbase/data/default/C_CONS/0816378c837f1f3b84f4d4060d22beb3 > 84.7 M /hbase/data/default/C_CONS/083da8f5eb8a5b1cca76376449f357ca > 346.6 M /hbase/data/default/C_CONS/0ac70fcb1baea0896ea069a6bcc30898 > 333.8 M /hbase/data/default/C_CONS/0b3be845bd4f5e958e8c9a18c8eaab21 > 72.7 M /hbase/data/default/C_CONS/12c13610c50dbc8ab27f20b0ebf2bfc4 > 76.1 M /hbase/data/default/C_CONS/1341966315d7e53be719d948d595bee0 > 72.4 M /hbase/data/default/C_CONS/1acdbc05c502b11da4852a1f21228f44 > 70.0 M /hbase/data/default/C_CONS/1b8f57d65f6c0e4de721e4c8f1944829 > 183.9 M /hbase/data/default/C_CONS/1f1ae7ca9f725fcf9639a4d52086fa50 > 65.5 M /hbase/data/default/C_CONS/20c10b96e2b9c40684aaeb6d0cfbf7c0 > 76.0 M /hbase/data/default/C_CONS/22515194fe09adcd4cbb2f5307303c73 > 78.4 M /hbase/data/default/C_CONS/236cd80393cb5b7c526bd2c45ce53a0a > 150.0 M /hbase/data/default/C_CONS/23bd80852f47b97b4122709ec844d4ed > 81.6 M /hbase/data/default/C_CONS/241b8bc415029dedf94c4a84e6c4ad3b > 77.9 M /hbase/data/default/C_CONS/27f1e59bde75ef3096a5bdd3eb402cd7 > 160.8 M /hbase/data/default/C_CONS/30c2ae3be38b8cdf3b337054a7d61478 > 372.2 M /hbase/data/default/C_CONS/31d606da71b35844d0cdc8a195c97d2e > 182.6 M /hbase/data/default/C_CONS/3274a022bc7419d426cf63caa1cc88e1 > 92.1 M /hbase/data/default/C_CONS/344faae7971d87b51edf23f75a7c3746 > 154.7 M /hbase/data/default/C_CONS/3b3f0c839bdb32ed2104f67c8a02da41 > 77.4 M /hbase/data/default/C_CONS/3cf6b2bd0cfe85f3111d0ba1b84a60b4 > 71.5 M /hbase/data/default/C_CONS/3f466db078d07e2ddddbfb11c681e0e3 > 77.8 M /hbase/data/default/C_CONS/3f8c1b7dec05118eb9894bb591e32b2f > 83.6 M /hbase/data/default/C_CONS/45e105856fcb54748c48bd45e973a3b9 > 185.2 M /hbase/data/default/C_CONS/4becd90d46a2d4a6bd8ecbe02b60892c > 165.6 M /hbase/data/default/C_CONS/4dcebd58c7013062c4a8583012a11b5a > 67.3 M /hbase/data/default/C_CONS/51f845d842605dda66b1ae01ad8a17e8 > 148.2 M /hbase/data/default/C_CONS/532189155ab78dbd1e36aac3ab4878a8 > 172.6 M /hbase/data/default/C_CONS/5401d9cb19adb9bd78718ea047e6d9d7 > 139.4 M /hbase/data/default/C_CONS/547d2a8c54aae73e8f12b4570efd984c > 89.5 M /hbase/data/default/C_CONS/54cbac1f71c7781697052bb2aa1c5a18 > 101.3 M /hbase/data/default/C_CONS/55263ce293327683b9c6e6098ec3e89a > 85.2 M /hbase/data/default/C_CONS/55f8c278e35de6bca5083c7a66e355fb > 85.8 M /hbase/data/default/C_CONS/57112558912e1de016327e115bc84f11 > 171.8 M /hbase/data/default/C_CONS/572b886cbfe92ddcb97502f041953fb8 > 51 /hbase/data/default/C_CONS/6bd64d8cf6b38806731f7693bdd673c9 > 86.6 M /hbase/data/default/C_CONS/7695703b7b527afc5f3524eee9b5d806 > 74.8 M /hbase/data/default/C_CONS/7bb7567685f5e16a4379d7cf79de2ecc > 120.1 M /hbase/data/default/C_CONS/7c144bef991bb3c959d7ef6e2fa5036a > 166.0 M /hbase/data/default/C_CONS/7c7817eb3e531d5bda88b5f0de6a20de > 173.5 M /hbase/data/default/C_CONS/7d07c139575d007ecbb23fa946e39130 > 139.2 M /hbase/data/default/C_CONS/8295aa701110ddf4055e8c3ca5bd9cad > 91.7 M /hbase/data/default/C_CONS/84b340d22471580ed8100d6614668eb1 > 81.2 M /hbase/data/default/C_CONS/8605f4470498a01a5ec4c88e7ea8a458 > 78.3 M /hbase/data/default/C_CONS/897da8e33275b80926ef38200132f819 > 234.4 M /hbase/data/default/C_CONS/93f5ce30ed8e54cc282cb5b88fa28d76 > 126.3 M /hbase/data/default/C_CONS/96dd1decd62e35c394bb8e7f6095f054 > 80.9 M /hbase/data/default/C_CONS/998364405e57a7eedae094bca76a419e > 184.8 M /hbase/data/default/C_CONS/9df3b62b1bff59b67b75ad86d694b8c8 > 126.6 M /hbase/data/default/C_CONS/a4531e06f3440349e7e6776b8bfedaf0 > 79.3 M /hbase/data/default/C_CONS/aa0b8341d3ca925ed24309f46e0ab845 > 79.9 M /hbase/data/default/C_CONS/aa45bfa549a439ded2a8b159a5c9caaa > 84.9 M /hbase/data/default/C_CONS/abae60b33de2999698a7452ff62dad08 > 87.0 M /hbase/data/default/C_CONS/ac5ff05785bc6e07637106450c74d02a > 80.7 M /hbase/data/default/C_CONS/aca765b578b236978b11ec26c167a958 > 68.0 M /hbase/data/default/C_CONS/b03614566cc8d521a9c983d418b57866 > 77.4 M /hbase/data/default/C_CONS/b1ae0451f592b28eed8a58908f91293a > 91.5 M /hbase/data/default/C_CONS/b8396049e2b742108add1485c0eb4aeb > 81.2 M /hbase/data/default/C_CONS/b8d25b3e536b4fea5ee4ee2b21885c76 > 87.8 M /hbase/data/default/C_CONS/bbfbe319705df23a23a89b40e52d89a8 > 81.3 M /hbase/data/default/C_CONS/bccaeedc65d9295289f78aaec588cc3d > 95.8 M /hbase/data/default/C_CONS/c229d583958802571dfaa9a39453df0d > 88.5 M /hbase/data/default/C_CONS/c9d7a038243d1b3e2448a48007f1f9e0 > 158.8 M /hbase/data/default/C_CONS/cca1bf1f013724af25d71ad4310e5d4a > 212.8 M /hbase/data/default/C_CONS/ccabf798734aa8e05798c43c132ad565 > 85.1 M /hbase/data/default/C_CONS/d1cb54346e109b1ba76fd95aa4540161 > 84.4 M /hbase/data/default/C_CONS/d4dd8c3fa81b751892689cc92a96aa99 > 139.5 M /hbase/data/default/C_CONS/dc15ceeed21474b51086f3103cbd0074 > 97.7 M /hbase/data/default/C_CONS/df20e2077f22e83ecd8e55550d52dea1 > 221.0 M /hbase/data/default/C_CONS/e30d0d55e0887a676c8b79e03771ad23 > 75.7 M /hbase/data/default/C_CONS/e6ed24ce0b3e1e903bd9757d28380f3a > 74.9 M /hbase/data/default/C_CONS/e9732d9905f5373fb0fd7a1ce033e17b > 101.2 M /hbase/data/default/C_CONS/f2a49dbaf018f0e45bbd7a758f123418 > 172.6 M /hbase/data/default/C_CONS/f34645de36d3c1413ce83177e2118947 > 89.2 M /hbase/data/default/C_CONS/f3db2bf3b7ffb7b4c0029eac5d631bdb > 81.6 M /hbase/data/default/C_CONS/f43b49c4f384853266e9ee45a98104a6 > 68.9 M /hbase/data/default/C_CONS/fa4fb0047ec98fb10bf84fd72937f415 > 86.7 M /hbase/data/default/C_CONS/fc69f349655676e046c9110550825f5a > 155.0 M /hbase/data/default/C_CONS/feb0835bdf73c257de11c65f18b1330d > 75.2 M /hbase/data/default/C_CONS/fff9fbe56af8b9e0e00826f8936e7a56 > > > > From the result above we can see that the biggest region's size is 346.6 > M, while most other regions' size are near each other. > > So what may be the real reason ? > > 2014-09-30 12:17 GMT+08:00 Vladimir Rodionov > : > >> HBase TableInputFormat creates input splits one per each region. You can >> not achieve high level of parallelism unless you have 5-10 regions per RS >> at least. What does it mean? You probably have too few regions. You can >> verify that in HBase Web UI. >> >> -Vladimir Rodionov >> >> On Mon, Sep 29, 2014 at 7:21 PM, Tao Xiao >> wrote: >> >>> I submitted a job in Yarn-Client mode, which simply reads from a HBase >>> table containing tens of millions of records and then does a *count *action. >>> The job runs for a much longer time than I expected, so I wonder whether it >>> was because the data to read was too much. Actually, there are 20 nodes in >>> my Hadoop cluster so the HBase table seems not so big (tens of millopns of >>> records). : >>> >>> I'm using CDH 5.0.0 (Spark 0.9 and HBase 0.96). >>> >>> BTW, when the job was running, I can see logs on the console, and >>> specifically I'd like to know what the following log means: >>> >>> 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Starting task 0.0:20 as >>> TID 20 on executor 2: b04.jsepc.com (PROCESS_LOCAL) >>> 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Serialized task 0.0:20 >>> as 13454 bytes in 0 ms >>> 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Finished TID 19 in >>> 16426 ms on b04.jsepc.com (progress: 18/86) >>> 14/09/30 09:45:20 INFO scheduler.DAGScheduler: Completed ResultTask(0, >>> 19) >>> >>> >>> Thanks >>> >> >> > --20cf303dd42208528e050454992c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Can you launch a job which exercises=C2=A0Ta= bleInputFormat on the same table without using Spark ?

This would show whether the slowdo= wn is in HBase code or somewhere else.

<= /span>
Cheers

On Mon, Sep 29, 2014 at 11:40 PM, Ta= o Xiao <xiaotao.cs.nju@gmail.com> wrote:
I checked HBase UI. Well, this table= is not completely evenly spread across the nodes, but I think to some exte= nt it can be seen as nearly evenly spread - at least there is not a single = node which has too many regions.=C2=A0 Here is a screenshot of H= Base UI.=C2=A0
=C2=A0
Besides, I checked the size of each regio= n in bytes for this table in the HBase shell as follows:


-bas= h-4.1$ hadoop dfs -du -h /hbase/data/default/C_CONS
=
DEPRECATED: Use of this script t= o execute hdfs command is deprecated.
Instead use the hdfs command for it.

<= /div>
288 =C2=A0 =C2=A0 =C2= =A0/hbase/data/default/C_CONS/.tabledesc
0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/hbase/data/de= fault/C_CONS/.tmp
159.6 M =C2=A0/hbase/data/default/C_CONS/0008c2494a5399d68495d9c8a= e147821
7= 6.7 M =C2=A0 /hbase/data/default/C_CONS/021d7d21d7faeb7b2a77835d6f86747e
81.3 M =C2= =A0 /hbase/data/default/C_CONS/02a39a316ac6d2bda89e72e74aa18a6e
155.3 M =C2=A0/hbase= /data/default/C_CONS/02fe51bc077290febc85651d8ee31abc
173.4 M =C2=A0/hbase/data/defa= ult/C_CONS/045859bcc70e36eb4d33f8ca3b7d9633
82.6 M =C2=A0 /hbase/data/default/C_CONS= /05c868b6036cc4f1836f70be6215c851
74.1 M =C2=A0 /hbase/data/default/C_CONS/0816378c8= 37f1f3b84f4d4060d22beb3
84.7 M =C2=A0 /hbase/data/default/C_CONS/083da8f5eb8a5b1cca7= 6376449f357ca
346.6 M =C2=A0/hbase/data/default/C_CONS/0ac70fcb1baea0896ea069a6bcc30= 898
333.8= M =C2=A0/hbase/data/default/C_CONS/0b3be845bd4f5e958e8c9a18c8eaab21=
72.7 M =C2=A0 /= hbase/data/default/C_CONS/12c13610c50dbc8ab27f20b0ebf2bfc4
76.1 M =C2=A0 /hbase/data= /default/C_CONS/1341966315d7e53be719d948d595bee0
72.4 M =C2=A0 /hbase/data/default/C= _CONS/1acdbc05c502b11da4852a1f21228f44
70.0 M =C2=A0 /hbase/data/default/C_CONS/1b8f= 57d65f6c0e4de721e4c8f1944829
183.9 M =C2=A0/hbase/data/default/C_CONS/1f1ae7ca9f725f= cf9639a4d52086fa50
65.5 M =C2=A0 /hbase/data/default/C_CONS/20c10b96e2b9c40684aaeb6d= 0cfbf7c0
= 76.0 M =C2=A0 /hbase/data/default/C_CONS/22515194fe09adcd4cbb2f5307303c73
78.4 M =C2= =A0 /hbase/data/default/C_CONS/236cd80393cb5b7c526bd2c45ce53a0a
150.0 M =C2=A0/hbase= /data/default/C_CONS/23bd80852f47b97b4122709ec844d4ed
81.6 M =C2=A0 /hbase/data/defa= ult/C_CONS/241b8bc415029dedf94c4a84e6c4ad3b
77.9 M =C2=A0 /hbase/data/default/C_CONS= /27f1e59bde75ef3096a5bdd3eb402cd7
160.8 M =C2=A0/hbase/data/default/C_CONS/30c2ae3be= 38b8cdf3b337054a7d61478
372.2 M =C2=A0/hbase/data/default/C_CONS/31d606da71b35844d0c= dc8a195c97d2e
182.6 M =C2=A0/hbase/data/default/C_CONS/3274a022bc7419d426cf63caa1cc8= 8e1
92.1 = M =C2=A0 /hbase/data/default/C_CONS/344faae7971d87b51edf23f75a7c3746=
154.7 M =C2=A0/= hbase/data/default/C_CONS/3b3f0c839bdb32ed2104f67c8a02da41
77.4 M =C2=A0 /hbase/data= /default/C_CONS/3cf6b2bd0cfe85f3111d0ba1b84a60b4
71.5 M =C2=A0 /hbase/data/default/C= _CONS/3f466db078d07e2ddddbfb11c681e0e3
77.8 M =C2=A0 /hbase/data/default/C_CONS/3f8c= 1b7dec05118eb9894bb591e32b2f
83.6 M =C2=A0 /hbase/data/default/C_CONS/45e105856fcb54= 748c48bd45e973a3b9
185.2 M =C2=A0/hbase/data/default/C_CONS/4becd90d46a2d4a6bd8ecbe0= 2b60892c
= 165.6 M =C2=A0/hbase/data/default/C_CONS/4dcebd58c7013062c4a8583012a11b5a
67.3 M =C2= =A0 /hbase/data/default/C_CONS/51f845d842605dda66b1ae01ad8a17e8
148.2 M =C2=A0/hbase= /data/default/C_CONS/532189155ab78dbd1e36aac3ab4878a8
172.6 M =C2=A0/hbase/data/defa= ult/C_CONS/5401d9cb19adb9bd78718ea047e6d9d7
139.4 M =C2=A0/hbase/data/default/C_CONS= /547d2a8c54aae73e8f12b4570efd984c
89.5 M =C2=A0 /hbase/data/default/C_CONS/54cbac1f7= 1c7781697052bb2aa1c5a18
101.3 M =C2=A0/hbase/data/default/C_CONS/55263ce293327683b9c= 6e6098ec3e89a
85.2 M =C2=A0 /hbase/data/default/C_CONS/55f8c278e35de6bca5083c7a66e35= 5fb
85.8 = M =C2=A0 /hbase/data/default/C_CONS/57112558912e1de016327e115bc84f11=
171.8 M =C2=A0/= hbase/data/default/C_CONS/572b886cbfe92ddcb97502f041953fb8
51 =C2=A0 =C2=A0 =C2=A0 /= hbase/data/default/C_CONS/6bd64d8cf6b38806731f7693bdd673c9
86.6 M =C2=A0 /hbase/data= /default/C_CONS/7695703b7b527afc5f3524eee9b5d806
74.8 M =C2=A0 /hbase/data/default/C= _CONS/7bb7567685f5e16a4379d7cf79de2ecc
120.1 M =C2=A0/hbase/data/default/C_CONS/7c14= 4bef991bb3c959d7ef6e2fa5036a
166.0 M =C2=A0/hbase/data/default/C_CONS/7c7817eb3e531d= 5bda88b5f0de6a20de
173.5 M =C2=A0/hbase/data/default/C_CONS/7d07c139575d007ecbb23fa9= 46e39130
= 139.2 M =C2=A0/hbase/data/default/C_CONS/8295aa701110ddf4055e8c3ca5bd9cad
91.7 M =C2= =A0 /hbase/data/default/C_CONS/84b340d22471580ed8100d6614668eb1
81.2 M =C2=A0 /hbase= /data/default/C_CONS/8605f4470498a01a5ec4c88e7ea8a458
78.3 M =C2=A0 /hbase/data/defa= ult/C_CONS/897da8e33275b80926ef38200132f819
234.4 M =C2=A0/hbase/data/default/C_CONS= /93f5ce30ed8e54cc282cb5b88fa28d76
126.3 M =C2=A0/hbase/data/default/C_CONS/96dd1decd= 62e35c394bb8e7f6095f054
80.9 M =C2=A0 /hbase/data/default/C_CONS/998364405e57a7eedae= 094bca76a419e
184.8 M =C2=A0/hbase/data/default/C_CONS/9df3b62b1bff59b67b75ad86d694b= 8c8
126.6= M =C2=A0/hbase/data/default/C_CONS/a4531e06f3440349e7e6776b8bfedaf0=
79.3 M =C2=A0 /= hbase/data/default/C_CONS/aa0b8341d3ca925ed24309f46e0ab845
79.9 M =C2=A0 /hbase/data= /default/C_CONS/aa45bfa549a439ded2a8b159a5c9caaa
84.9 M =C2=A0 /hbase/data/default/C= _CONS/abae60b33de2999698a7452ff62dad08
87.0 M =C2=A0 /hbase/data/default/C_CONS/ac5f= f05785bc6e07637106450c74d02a
80.7 M =C2=A0 /hbase/data/default/C_CONS/aca765b578b236= 978b11ec26c167a958
68.0 M =C2=A0 /hbase/data/default/C_CONS/b03614566cc8d521a9c983d4= 18b57866
= 77.4 M =C2=A0 /hbase/data/default/C_CONS/b1ae0451f592b28eed8a58908f91293a
91.5 M =C2= =A0 /hbase/data/default/C_CONS/b8396049e2b742108add1485c0eb4aeb
81.2 M =C2=A0 /hbase= /data/default/C_CONS/b8d25b3e536b4fea5ee4ee2b21885c76
87.8 M =C2=A0 /hbase/data/defa= ult/C_CONS/bbfbe319705df23a23a89b40e52d89a8
81.3 M =C2=A0 /hbase/data/default/C_CONS= /bccaeedc65d9295289f78aaec588cc3d
95.8 M =C2=A0 /hbase/data/default/C_CONS/c229d5839= 58802571dfaa9a39453df0d
88.5 M =C2=A0 /hbase/data/default/C_CONS/c9d7a038243d1b3e244= 8a48007f1f9e0
158.8 M =C2=A0/hbase/data/default/C_CONS/cca1bf1f013724af25d71ad4310e5= d4a
212.8= M =C2=A0/hbase/data/default/C_CONS/ccabf798734aa8e05798c43c132ad565=
85.1 M =C2=A0 /= hbase/data/default/C_CONS/d1cb54346e109b1ba76fd95aa4540161
84.4 M =C2=A0 /hbase/data= /default/C_CONS/d4dd8c3fa81b751892689cc92a96aa99
139.5 M =C2=A0/hbase/data/default/C= _CONS/dc15ceeed21474b51086f3103cbd0074
97.7 M =C2=A0 /hbase/data/default/C_CONS/df20= e2077f22e83ecd8e55550d52dea1
221.0 M =C2=A0/hbase/data/default/C_CONS/e30d0d55e0887a= 676c8b79e03771ad23
75.7 M =C2=A0 /hbase/data/default/C_CONS/e6ed24ce0b3e1e903bd9757d= 28380f3a
= 74.9 M =C2=A0 /hbase/data/default/C_CONS/e9732d9905f5373fb0fd7a1ce033e17b
101.2 M = =C2=A0/hbase/data/default/C_CONS/f2a49dbaf018f0e45bbd7a758f123418
172.6 M =C2=A0/hba= se/data/default/C_CONS/f34645de36d3c1413ce83177e2118947
<= div>
89.2 M =C2=A0 /hbase/data/de= fault/C_CONS/f3db2bf3b7ffb7b4c0029eac5d631bdb
<= font face=3D"courier new, monospace">81.6 M =C2=A0 /hbase/data/default/C_CO= NS/f43b49c4f384853266e9ee45a98104a6
68.9 M =C2=A0 /hbase/data/default/C_CONS/fa4fb0= 047ec98fb10bf84fd72937f415
86.7 M =C2=A0 /hbase/data/default/C_CONS/fc69f349655676e0= 46c9110550825f5a
155.0 M =C2=A0/hbase/data/default/C_CONS/feb0835bdf73c257de11c65f18= b1330d
75= .2 M =C2=A0 /hbase/data/default/C_CONS/fff9fbe56af8b9e0e00826f8936e7a56


From the res= ult above we can see that the biggest region's size is=C2=A0346.6 M, while most other = regions' size are near each other.=C2=A0

So what may be the real reason ?

2014-09-30 12:17 GMT+08:00 Vladimir Rodionov <vrodionov@spl= icemachine.com>:
HBase TableInputFormat creates input splits one per each regi= on. You can not achieve high level of parallelism unless you have 5-10 regi= ons per RS at least. What does it mean? You probably have too few regions. = You can verify that in HBase Web UI.

<= /font>
-Vladimir Rodionov

On Mon, Sep 29, 2014 at 7:21 PM, Tao Xiao <xiaotao.cs.nju@gmail.com> wrote:
I submitted a job in Yarn-= Client mode, which simply reads from a HBase table containing tens of milli= ons of records and then does a count action. The job runs for a much= longer time than I expected, so I wonder whether it was because the data t= o read was too much. Actually, there are 20 nodes in my Hadoop cluster so t= he HBase table seems not so big (tens of millopns of records). :
=
I'm using CDH 5.0.0 (Spark 0.9 and HBase 0.96).

BTW, when the job was running, I can see logs on the conso= le, and specifically I'd like to know what the following log means:

14/09/30 09:45:20 INFO sch= eduler.TaskSetManager: Starting task 0.0:20 as TID 20 on executor 2: b04.jsepc.com (PROCESS_LOC= AL)
14/09/30 09:45:2= 0 INFO scheduler.TaskSetManager: Serialized task 0.0:20 as 13454 bytes in 0= ms
14/09/30 09:45:2= 0 INFO scheduler.TaskSetManager: Finished TID 19 in 16426 ms on b04.jsepc.com (progress: 18/86)=
14/09/30 09:45:20 I= NFO scheduler.DAGScheduler: Completed ResultTask(0, 19)


Thanks



--20cf303dd42208528e050454992c--