hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Clark <a...@bitstew.com>
Subject Reading remote HDFS file with Java Client
Date Sat, 26 Apr 2014 05:58:55 GMT
Hello all, I¹m having a bit of trouble with a simple Hadoop install.  I¹ve
downloaded hadoop 2.4.0 and installed on a single CentOS Linux node (Virtual
Machine).  I¹ve configured hadoop for a single node with pseudo distribution
as described on the apache site
leCluster.html).  It starts with no issues in the logs and I can read +
write files using the ³hadoop fs² commands from the command line.

I¹m attempting to read a file from the HDFS on a remote machine with the
Java API.  The machine can connect and list directory contents.  It can also
determine if a file exists with the code:

Path p=new Path("hdfs://test.server:9000/usr/test/test_file.txt");
FileSystem fs = FileSystem.get(new Configuration());
System.out.println(p.getName() + " exists: " + fs.exists(p));

The system prints ³true² indicating it exists.  However, when I attempt to
read the file with:

BufferedReader br = null;
try {
Path p=new Path("hdfs://test.server:9000/usr/test/test_file.txt");
FileSystem fs = FileSystem.get(CONFIG);
System.out.println(p.getName() + " exists: " + fs.exists(p));

br=new BufferedReader(new InputStreamReader(fs.open(p)));
String line = br.readLine();

while (line != null) {
finally {
    if(br != null) br.close();

this code throws the exception:

Exception in thread "main" org.apache.hadoop.hdfs.BlockMissingException:
Could not obtain block:

Googling gave some possible tips but all checked out. The data node is
connected, active, and has enough space.  The admin report from hdfs
dfsadmin ­report shows:

Configured Capacity: 52844687360 (49.22 GB)
Present Capacity: 48507940864 (45.18 GB)
DFS Remaining: 48507887616 (45.18 GB)
DFS Used: 53248 (52 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: (test.server)
Hostname: test.server
Decommission Status : Normal
Configured Capacity: 52844687360 (49.22 GB)
DFS Used: 53248 (52 KB)
Non DFS Used: 4336746496 (4.04 GB)
DFS Remaining: 48507887616 (45.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.79%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Fri Apr 25 22:16:56 PDT 2014

The client jars were copied directly from the hadoop install so no version
mismatch there.  I can browse the file system with my Java class and read
file attributes.  I just can¹t read the file contents without getting the
exception.  If I try to write a file with the code:

FileSystem fs = null;
BufferedWriter br = null;

System.setProperty("HADOOP_USER_NAME", "root");

try {
fs = FileSystem.get(new Configuraion());

//Path p = new Path(dir, file);
Path p = new Path("hdfs://test.server:9000/usr/test/test.txt");
br = new BufferedWriter(new OutputStreamWriter(fs.create(p,true)));
br.write("Hello World");
finally {
if(br != null) br.close();
if(fs != null) fs.close();

this creates the file but doesn¹t write any bytes and throws the exception:

Exception in thread "main"
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/usr/test/test.txt could only be replicated to 0 nodes instead of
minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are
excluded in this operation.
Googling for this indicated a possible space issue but from the dfsadmin
report, it seems there is plenty of space.  This is a plain vanilla install
and I can¹t get past this issue.

The environment summary is:


Hadoop 2.4.0 with pseudo-distribution

CentOS 6.5 Virtual Machine 64 bit server
Java 1.7.0_55

Windows 8 (Virtual Machine)
Java 1.7.0_51

Any help is greatly appreciated.

View raw message