Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: 74.125.82.176 is neither permitted
 nor denied by domain of oded@legolas-media.com)
MIME-Version: 1.0
Date: Fri, 4 Jun 2010 22:40:22 +0300
Message-ID: <AANLkTimmmqBiF4NFxK6vRid8GQaAp_ZJY0ZwXplA_568@mail.gmail.com>
Subject: Problematic disk in a datanode
From: Oded Rosen <oded@legolas-media.com>
To: general@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016e6d7e7446d03020488397e67

--0016e6d7e7446d03020488397e67
Content-Type: text/plain; charset=ISO-8859-1

Hey,

A while ago We've added a new disk (volume) to every datanode in our
cluster.
We have configured the disks in "data.dfs.dir" in hdfs-site both on the job
tracker and on each machine.
This went successfully for all of the machines except one, where the new
disk was not recognized by hadoop.

We can not find out what's wrong with it.

We know that the new disk is not recognized because "http://namenode:50070/"
shows smaller capacity to that machine.
The mapred + hdfs directories on that drive exist, but they are not
identical to the structure of directories in other disks:
In the problematic drive there is no "local" directory under "mapred", and
no "name", "namesecondary" directories under "hdfs".

This problem was not so terrible until now, when the rest of the disks are
full:
The logs started containing errors such as "No space left on device" and
"DiskErrorException: Could not find any valid local directory for
taskTracker/jobcache/".
Some Hadoop jobs fail with the same errors, and the datanode+tasktracker on
that machine crash a lot.

How do we install this disk properly?

Thanks in advance.

Technical info: hadoop-0.20, centos, each machine is datanode and
tasktracker (another machine is jobtracker + namenode).

-- 
Oded

--0016e6d7e7446d03020488397e67--