hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject hadoop fs -du & hbase table size
Date Mon, 14 Mar 2011 14:34:40 GMT

As far as I understand, since "hadoop fs -du" command uses Linux' "du"
internally this mean that the number of replicas (at the moment of command
run) affect the result. Is that correct?

I have the following case.
I have a small (1 master + 5 slaves each with DN, TT & RS) test HBase
cluster with replication set to 2. The tables data size is monitoried with
the help of "hadoop fs -du" command. There's a table which is constantly
written to: data is only added in it.
At some point I decided to reconfigure one of the slaves and shut it down.
After reconfiguration (HBase already marked it as dead one) I brought it up
again. Things went smoothly. However on the table size graph (I drew from
data fetched with "hadoop fs -du" command) I noticed a little spike up on
data size and then it went down to the normal/expected values. Can it be so
that at some point of the taking out/reconfiguring/adding back node
procedure at some point blocks were over-replicated? I'd expect them to be
under-replicated for some time (as DN is down) and I'd expect to see the
inverted spike: small decrease in data amount and then back to "expected"
rate (after all blocks got replicated again). Any ideas?

Thank you,

Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

View raw message