Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of highpointe3i@gmail.com
 designates 209.85.213.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=subject:from:content-type:x-mailer:message-id:date:to
         :content-transfer-encoding:mime-version;
        b=pdbq1onV/OWCHK5KKG0BlqkqJP5JC5G898QIRBCtEpUFz6UVEmc4cUTvkS/j+JhhLx
         agIVY1DqhgXfDw/g+VMxq3DrpgaBG/trfU0VS6E/Dpq+UMSB3cL2kClYX5W1cq0KBQsr
         FYnfLZR/lVHzdOwNvlQKANI+Ko6q8m+PveI98=
Subject: DFS maxing out on single datadir
From: highpointe <highpointe3i@gmail.com>
Content-Type: text/plain;
	charset=us-ascii
Message-Id: <F977E297-5BFB-4C21-8D00-CD7B40DC47D1@gmail.com>
Date: Wed, 18 May 2011 19:37:35 -0600
To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (iPhone Mail 8J2)

System:  HDFS dirs across cluster as dataone, datatwo, datathree

I recently had an issue where I lost a slave that resulted in a large amount=
 of under replicated blocks.

The replication was quite slow on the uptake so I thought running a hadoop b=
alancer would help.

This seemed to exacerbate the situation so I killed the balancer.=20

Hadoop then proceeded to write all new data to dataone across each slave. It=
 would wait until the dataone dir was at 100% then move to the next slave in=
 sequence. datatwo and datathree were completely ignored.

DFS showed <10% free and was quickly diving.

I ended up restarting the entire cluster (DFS and MapRed) and things started=
 acting normal again (writing to all three replicants).

Has anyone experienced this or have any idea why it would happen?

Thanks for the help.=20

Sent from my iPhone=