hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Too many open files Error
Date Thu, 26 Jan 2012 20:21:09 GMT
Agree with Raj V here - Your problem should not be the # of transfer
threads nor the number of open files given that stacktrace.

And the values you've set for the transfer threads are far beyond
recommendations of 4k/8k - I would not recommend doing that. Default
in 1.0.0 is 256 but set it to 2048/4096, which are good value to have
when noticing increased HDFS load, or when running services like
HBase.

You should instead focus on why its this particular job (or even
particular task, which is important to notice) that fails, and not
other jobs (or other task attempts).

On Fri, Jan 27, 2012 at 1:10 AM, Raj V <rajvish@yahoo.com> wrote:
> Mark
>
> You have this "Connection reset by peer". Why do you think this problem is related to
too many open files?
>
> Raj
>
>
>
>>________________________________
>> From: Mark question <markq2011@gmail.com>
>>To: common-user@hadoop.apache.org
>>Sent: Thursday, January 26, 2012 11:10 AM
>>Subject: Re: Too many open files Error
>>
>>Hi again,
>>I've tried :
>>     <property>
>>        <name>dfs.datanode.max.xcievers</name>
>>        <value>1048576</value>
>>      </property>
>>but I'm still getting the same error ... how high can I go??
>>
>>Thanks,
>>Mark
>>
>>
>>
>>On Thu, Jan 26, 2012 at 9:29 AM, Mark question <markq2011@gmail.com> wrote:
>>
>>> Thanks for the reply.... I have nothing about dfs.datanode.max.xceivers on
>>> my hdfs-site.xml so hopefully this would solve the problem and about the
>>> ulimit -n , I'm running on an NFS cluster, so usually I just start Hadoop
>>> with a single bin/start-all.sh ... Do you think I can add it by
>>> bin/Datanode -ulimit n ?
>>>
>>> Mark
>>>
>>>
>>> On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn <mapred.learn@gmail.com>wrote:
>>>
>>>> U need to set ulimit -n <bigger value> on datanode and restart datanodes.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jan 26, 2012, at 6:06 AM, Idris Ali <psychidris@gmail.com> wrote:
>>>>
>>>> > Hi Mark,
>>>> >
>>>> > On a lighter note what is the count of xceivers?
>>>> dfs.datanode.max.xceivers
>>>> > property in hdfs-site.xml?
>>>> >
>>>> > Thanks,
>>>> > -idris
>>>> >
>>>> > On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel <
>>>> michael_segel@hotmail.com>wrote:
>>>> >
>>>> >> Sorry going from memory...
>>>> >> As user Hadoop or mapred or hdfs what do you see when you do a ulimit
>>>> -a?
>>>> >> That should give you the number of open files allowed by a single
>>>> user...
>>>> >>
>>>> >>
>>>> >> Sent from a remote device. Please excuse any typos...
>>>> >>
>>>> >> Mike Segel
>>>> >>
>>>> >> On Jan 26, 2012, at 5:13 AM, Mark question <markq2011@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> Hi guys,
>>>> >>>
>>>> >>>  I get this error from a job trying to process 3Million records.
>>>> >>>
>>>> >>> java.io.IOException: Bad connect ack with firstBadLink
>>>> >> 192.168.1.20:50010
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>>>> >>>
>>>> >>> When I checked the logfile of the datanode-20, I see :
>>>> >>>
>>>> >>> 2012-01-26 03:00:11,827 ERROR
>>>> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>> >>> 192.168.1.20:50010,
>>>> >> storageID=DS-97608578-192.168.1.20-50010-1327575205369,
>>>> >>> infoPort=50075, ipcPort=50020):DataXceiver
>>>> >>> java.io.IOException: Connection reset by peer
>>>> >>>   at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>> >>>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>>> >>>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>>>> >>>   at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>>>> >>>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>> >>>   at
>>>> >>>
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>> >>>   at
>>>> >>>
>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>> >>>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>>>> >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>>> >>>   at java.io.DataInputStream.read(DataInputStream.java:132)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
>>>> >>>   at
>>>> >>>
>>>> >>
>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>>>> >>>   at java.lang.Thread.run(Thread.java:662)
>>>> >>>
>>>> >>>
>>>> >>> Which is because I'm running 10 maps per taskTracker on a 20
node
>>>> >> cluster,
>>>> >>> each map opens about 300 files so that should give 6000 opened
files
>>>> at
>>>> >> the
>>>> >>> same time ... why is this a problem? the maximum # of files
per
>>>> process
>>>> >> on
>>>> >>> one machine is:
>>>> >>>
>>>> >>> cat /proc/sys/fs/file-max   ---> 2403545
>>>> >>>
>>>> >>>
>>>> >>> Any suggestions?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Mark
>>>> >>
>>>>
>>>
>>>
>>
>>
>>



-- 
Harsh J
Customer Ops. Engineer, Cloudera

Mime
View raw message