hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Rupp <ch...@crupp.de>
Subject FSDataInputStream.read() returns -1 when reading existing file
Date Tue, 18 Dec 2012 10:41:00 GMT
Hi,

i'm having a problem with an application that reads files from HDFS.
FSDataInputStream.read() returns -1, although the file exists. I am using
2.0.0-cdh4.1.2. Here are the details:

One of the machines (testNN) runs the NameNode. The other three (testDN1 -
testDN3) run the DataNodes.
The DataNode machines have processes which use the HDFS API to create
files, append data, then close the files.

TestNN runs another process which fetches a directory listing, opens and
reads all files that it can find. This works fine for some files, but
constantly fails for other files.

In my tests i write a lot of debug info to the log file. They show the
following sequence:

TestDN1:
  Creates a file
  FSDataOutputStream.write() of 100 bytes
  FSDataOutputStream.sync()
  FSDataOutputStream.write() of 34 bytes
  FSDataOutputStream.sync()
  Closes the file

Afterwards, the application on TestNN1 starts:

TestNN1:
  Opens the file
  Retrieves file length (with DFSDataInputStream.getVisibleLength()) -
returns 134
  Reads the first 26 bytes (they contain a header structure), fails with -1

TestNN1 is not able to recover; my program will loop forever, always
reading -1 bytes.

Note that the file was actually closed before it was reopened on another
node. But i run into similar problems if TestDN1 still appends data while
TestNN1 already opened the file.

Additional information:
 - the cluster has 4 nodes, but i see the same failures when running a
larger cluster
 - the load on the cluster is low
 - my applications try to read and write multiple files at the same time,
but usually not more than 3 at a time.
 - i tried the same on a cluster with CDH3u5 and ran into the same issues
 - the situation does not improve if TestNN runs a DataNode
 - in the case above, no bytes were read from the file. But i think i saw
some test cases where the file was opened and  data was read successfully
till suddenly -1 was returned

I have copied the logs from the DataNode and the NameNode below; not sure
if they are relevant.

Every help is appreciated.

bye
Christoph

Here is the namenode log on TestNN:

2012-12-18 00:49:42,757 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock: /files/user/0.
BP-933181447-192.168.17.19-1355408853672
blk_-3327394396925549856_2192{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,replicas=[ReplicaUnderConstruction[192.168.17.20:50010
|RBW],
ReplicaUnderConstruction[192.168.17.22:50010|RBW],
ReplicaUnderConstruction[192.168.17.21:50010|RBW]]}
2012-12-18 00:49:42,774 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.fsync: file /files/user/0 for
DFSClient_NONMAPREDUCE_-1901701444_1
2012-12-18 00:49:42,817 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
addStoredBlock: blockMap updated: 192.168.17.21:50010 is added to
blk_-3327394396925549856_2192{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[192.168.17.20:50010
|RBW],
ReplicaUnderConstruction[192.168.17.22:50010|RBW],
ReplicaUnderConstruction[192.168.17.21:50010|RBW]]} size 0
2012-12-18 00:49:42,818 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
addStoredBlock: blockMap updated: 192.168.17.22:50010 is added to
blk_-3327394396925549856_2192{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[192.168.17.20:50010
|RBW],
ReplicaUnderConstruction[192.168.17.22:50010|RBW],
ReplicaUnderConstruction[192.168.17.21:50010|RBW]]} size 0
2012-12-18 00:49:42,820 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
addStoredBlock: blockMap updated: 192.168.17.20:50010 is added to
blk_-3327394396925549856_2192{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[192.168.17.20:50010
|RBW],
ReplicaUnderConstruction[192.168.17.22:50010|RBW],
ReplicaUnderConstruction[192.168.17.21:50010|RBW]]} size 0
2012-12-18 00:49:42,825 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file /files/user/0 is closed by
DFSClient_NONMAPREDUCE_-1901701444_1


This is the datanode log on TestDN:

2012-12-18 00:49:41,156 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.17.21:57594, dest: /192.168.17.20:50010, bytes: 65974272, op:
HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1484565721_1, offset: 0, srvID:
DS-1951213048-192.168.17.20-50010-1355408989152, blockid:
BP-933181447-192.168.17.19-1355408853672:blk_4084139446972877436_2188,
duration: 1130470196
2012-12-18 00:49:41,156 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder:
BP-933181447-192.168.17.19-1355408853672:blk_4084139446972877436_2188,
type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2012-12-18 00:49:41,514 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.17.22:53868, dest: /192.168.17.20:50010, bytes: 17936011, op:
HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1484565721_1, offset: 0, srvID:
DS-1951213048-192.168.17.20-50010-1355408989152, blockid:
BP-933181447-192.168.17.19-1355408853672:blk_2584695448116588511_2189,
duration: 1132434974
2012-12-18 00:49:41,514 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder:
BP-933181447-192.168.17.19-1355408853672:blk_2584695448116588511_2189,
type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2012-12-18 00:49:41,632 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.17.20:50010, dest: /192.168.17.20:59779, bytes: 516, op: HDFS_READ,
cliID: DFSClient_NONMAPREDUCE_1606828785_1, offset: 65973760, srvID:
DS-1951213048-192.168.17.20-50010-1355408989152, blockid:
BP-933181447-192.168.17.19-1355408853672:blk_4084139446972877436_2188,
duration: 196317
2012-12-18 00:49:42,118 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
Receiving block
BP-933181447-192.168.17.19-1355408853672:blk_8228725710443376769_2191 src:
/192.168.17.22:53870 dest: /192.168.17.20:50010
2012-12-18 00:49:42,368 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.17.20:50010, dest: /192.168.17.20:59780, bytes: 18076139, op:
HDFS_READ, cliID: DFSClient_NONMAPREDUCE_-1901701444_1, offset: 0, srvID:
DS-1951213048-192.168.17.20-50010-1355408989152, blockid:
BP-933181447-192.168.17.19-1355408853672:blk_2584695448116588511_2189,
duration: 650411212
2012-12-18 00:49:42,768 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
Receiving block
BP-933181447-192.168.17.19-1355408853672:blk_-3327394396925549856_2192 src:
/192.168.17.20:59781 dest: /192.168.17.20:50010
2012-12-18 00:49:42,827 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.17.20:59781, dest: /192.168.17.20:50010, bytes: 134, op:
HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1901701444_1, offset: 0, srvID:
DS-1951213048-192.168.17.20-50010-1355408989152, blockid:
BP-933181447-192.168.17.19-1355408853672:blk_-3327394396925549856_2192,
duration: 51772773
2012-12-18 00:49:42,827 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
BP-933181447-192.168.17.19-1355408853672:blk_-3327394396925549856_2192,
type=HAS_DOWNSTREAM_IN_PIPELINE terminating

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message