Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of alex.baranov.v@gmail.com
 designates 209.85.216.41 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=i+ryXRTGwsyLiI20jr77xwPkiGa7pA8haRvtfLcUcIq26lq3agEFE12yagLnGblmdV
         a1GV2EPRF5n/eU6LZRm/Cz8vgiCGb3FskPSctgG8LGNCMIpXudUmokeJ2z0uLX0cflrr
         +PLJrdJ/mD6ZchqfyeE8O8eJp4cmjbRFIrGFg=
MIME-Version: 1.0
Date: Mon, 14 Mar 2011 16:34:40 +0200
Message-ID: <AANLkTima8u9pot7TVhJO+xeW=KVYU4YmmAoxW=2aj-Kz@mail.gmail.com>
Subject: hadoop fs -du & hbase table size
From: Alex Baranau <alex.baranov.v@gmail.com>
To: user@hbase.apache.org, hdfs-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00235429d8f4433c45049e723690

--00235429d8f4433c45049e723690
Content-Type: text/plain; charset=ISO-8859-1

Hello,

As far as I understand, since "hadoop fs -du" command uses Linux' "du"
internally this mean that the number of replicas (at the moment of command
run) affect the result. Is that correct?

I have the following case.
I have a small (1 master + 5 slaves each with DN, TT & RS) test HBase
cluster with replication set to 2. The tables data size is monitoried with
the help of "hadoop fs -du" command. There's a table which is constantly
written to: data is only added in it.
At some point I decided to reconfigure one of the slaves and shut it down.
After reconfiguration (HBase already marked it as dead one) I brought it up
again. Things went smoothly. However on the table size graph (I drew from
data fetched with "hadoop fs -du" command) I noticed a little spike up on
data size and then it went down to the normal/expected values. Can it be so
that at some point of the taking out/reconfiguring/adding back node
procedure at some point blocks were over-replicated? I'd expect them to be
under-replicated for some time (as DN is down) and I'd expect to see the
inverted spike: small decrease in data amount and then back to "expected"
rate (after all blocks got replicated again). Any ideas?

Thank you,

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

--00235429d8f4433c45049e723690--