hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hajo Nils Krabbenhöft (JIRA) <j...@apache.org>
Subject [jira] Commented: (HDFS-1459) NullPointerException in DataInputStream.readInt
Date Sun, 24 Oct 2010 10:54:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924306#action_12924306
] 

Hajo Nils Krabbenhöft commented on HDFS-1459:
---------------------------------------------

I found this in my datanode logs:

2010-10-20 15:31:17,154 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.17.5.3:50010,
storageID=DS-266784496-78.46.65.54-50010-1287004808819, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 256
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
	at java.lang.Thread.run(Thread.java:619)

2010-10-20 15:31:19,115 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.17.5.3:50010,
storageID=DS-266784496-78.46.65.54-50010-1287004808819, infoPort=50075, ipcPort=50020):Got
exception while serving blk_-8099607957427967059_1974 to /10.17.5.4:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready
for write. ch : java.nio.channels.SocketChannel[connected local=/10.17.5.3:50010 remote=/10.17.5.4:51336]
	at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
	at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
	at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
	at java.lang.Thread.run(Thread.java:619)

and so far using this configuration snippet seems to fix the problem:

<property>
  <name>dfs.datanode.handler.count</name>
  <value>40</value>
  <description>The number of server threads for the datanode.</description>
</property>

<property>
  <name>dfs.namenode.handler.count</name>
  <value>40</value>
  <description>The number of server threads for the namenode.</description>
</property>

<property>      
  <name>dfs.datanode.max.xcievers</name>        
  <value>2048</value>   
  <description>The maximum # of threads that can be connected to a data
ndoe simultaneously. Default value is 256.      
  </description>        
</property>


So the underlying problem seems to be that when max xcievers is reached that the client does
not get notified and thus reports unusable error messages.

> NullPointerException in DataInputStream.readInt
> -----------------------------------------------
>
>                 Key: HDFS-1459
>                 URL: https://issues.apache.org/jira/browse/HDFS-1459
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>         Environment: Debian 64 bit
> Cloudera Hadoop
>            Reporter: Hajo Nils Krabbenhöft
>
> First, here's my source code accessing the HDFS:
> final FSDataInputStream indexFile = getFile(bucketPathStr, Integer.toString(hashTableId)
+ ".index");
> indexFile.seek(bucketId * 4);
> int bucketStart = ByteSwapper.swap(indexFile.readInt());
> int bucketEnd = ByteSwapper.swap(indexFile.readInt());
> final FSDataInputStream dataFile = getFile(bucketPathStr, Integer.toString(hashTableId)
+ ".data");
> dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);
> for (int hash = bucketStart; hash < bucketEnd; hash++) {
> 	int RimageIdA = ByteSwapper.swap(dataFile.readInt());
> 	int RimageIdB = ByteSwapper.swap(dataFile.readInt());
> 	....... read hash of length Hasher.getConfigHashLength() and work with it ....
> }
> As you can see, i am reading the range to be read from an X.index file and then read
these rows from X.data. The index file is always exactly 6.710.888 bytes in length.
> As for the data file, everything works fine with 50 different 1.35 GB (22 blocks) data
files and it fails every time i tried with 50 different 2.42 GB (39 blocks) data files. So
the cause of the bug is clearly dependent on the file size.
> I checked for ulimit and for the number of network connections and they are both not
maxed out when the error occurs. The stack trace i get is:
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:370)
> ...
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> which leads me to believe that DFSClient.blockSeekTo returns with a non-null chosenNode
but with blockReader = null.
> Since the exact same jar works flawlessly with small data files and fails reliably with
big data files, i'm wondering how this could possibly dependent on the file's size or block
count (DFSClient.java line 1628+):
> s = socketFactory.createSocket();
> NetUtils.connect(s, targetAddr, socketTimeout);
> s.setSoTimeout(socketTimeout);
> Block blk = targetBlock.getBlock();
> blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), 
>     blk.getGenerationStamp(),
>     offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
>     buffersize, verifyChecksum, clientName);
> return chosenNode;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message