hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj V <rajv...@yahoo.com>
Subject Re: Too many open files Error
Date Thu, 26 Jan 2012 19:40:02 GMT
Mark

You have this "Connection reset by peer". Why do you think this problem is related to too
many open files?

Raj



>________________________________
> From: Mark question <markq2011@gmail.com>
>To: common-user@hadoop.apache.org 
>Sent: Thursday, January 26, 2012 11:10 AM
>Subject: Re: Too many open files Error
> 
>Hi again,
>I've tried :
>     <property>
>        <name>dfs.datanode.max.xcievers</name>
>        <value>1048576</value>
>      </property>
>but I'm still getting the same error ... how high can I go??
>
>Thanks,
>Mark
>
>
>
>On Thu, Jan 26, 2012 at 9:29 AM, Mark question <markq2011@gmail.com> wrote:
>
>> Thanks for the reply.... I have nothing about dfs.datanode.max.xceivers on
>> my hdfs-site.xml so hopefully this would solve the problem and about the
>> ulimit -n , I'm running on an NFS cluster, so usually I just start Hadoop
>> with a single bin/start-all.sh ... Do you think I can add it by
>> bin/Datanode -ulimit n ?
>>
>> Mark
>>
>>
>> On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn <mapred.learn@gmail.com>wrote:
>>
>>> U need to set ulimit -n <bigger value> on datanode and restart datanodes.
>>>
>>> Sent from my iPhone
>>>
>>> On Jan 26, 2012, at 6:06 AM, Idris Ali <psychidris@gmail.com> wrote:
>>>
>>> > Hi Mark,
>>> >
>>> > On a lighter note what is the count of xceivers?
>>> dfs.datanode.max.xceivers
>>> > property in hdfs-site.xml?
>>> >
>>> > Thanks,
>>> > -idris
>>> >
>>> > On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel <
>>> michael_segel@hotmail.com>wrote:
>>> >
>>> >> Sorry going from memory...
>>> >> As user Hadoop or mapred or hdfs what do you see when you do a ulimit
>>> -a?
>>> >> That should give you the number of open files allowed by a single
>>> user...
>>> >>
>>> >>
>>> >> Sent from a remote device. Please excuse any typos...
>>> >>
>>> >> Mike Segel
>>> >>
>>> >> On Jan 26, 2012, at 5:13 AM, Mark question <markq2011@gmail.com>
>>> wrote:
>>> >>
>>> >>> Hi guys,
>>> >>>
>>> >>>  I get this error from a job trying to process 3Million records.
>>> >>>
>>> >>> java.io.IOException: Bad connect ack with firstBadLink
>>> >> 192.168.1.20:50010
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>>> >>>
>>> >>> When I checked the logfile of the datanode-20, I see :
>>> >>>
>>> >>> 2012-01-26 03:00:11,827 ERROR
>>> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>> >>> 192.168.1.20:50010,
>>> >> storageID=DS-97608578-192.168.1.20-50010-1327575205369,
>>> >>> infoPort=50075, ipcPort=50020):DataXceiver
>>> >>> java.io.IOException: Connection reset by peer
>>> >>>   at sun.nio.ch.FileDispatcher.read0(Native Method)
>>> >>>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>> >>>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>>> >>>   at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>>> >>>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>> >>>   at
>>> >>>
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>> >>>   at
>>> >>>
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>> >>>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>>> >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>> >>>   at java.io.DataInputStream.read(DataInputStream.java:132)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
>>> >>>   at
>>> >>>
>>> >>
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>>> >>>   at java.lang.Thread.run(Thread.java:662)
>>> >>>
>>> >>>
>>> >>> Which is because I'm running 10 maps per taskTracker on a 20 node
>>> >> cluster,
>>> >>> each map opens about 300 files so that should give 6000 opened files
>>> at
>>> >> the
>>> >>> same time ... why is this a problem? the maximum # of files per
>>> process
>>> >> on
>>> >>> one machine is:
>>> >>>
>>> >>> cat /proc/sys/fs/file-max   ---> 2403545
>>> >>>
>>> >>>
>>> >>> Any suggestions?
>>> >>>
>>> >>> Thanks,
>>> >>> Mark
>>> >>
>>>
>>
>>
>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message