hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Seeth <sv...@yahoo.com>
Subject RE: Reduce hangs at times
Date Sat, 24 Feb 2007 19:42:10 GMT
Thanks Igor for your input. 

I'm not using DFS and using local FS on a NetApp.
Hadoop 0.11.2 on Suse Linux 64 bit.

Venkat

--- Igor Bolotin <igorb@collarity.com> wrote:

> Just observed similar behavior this morning. The
> thread dump on one of
> the Jetty server showed that there was one thread
> trying to open file on
> DFS, while all other threads waited for this one to
> because of
> synchronization: 
> 
> "pool-2-thread-110" prio=1 tid=0x08aa8bc8 nid=0x7570
> runnable
> [0x0015f000..0x00160130]
>         at
> java.net.SocketInputStream.socketRead0(Native
> Method)
>         at
>
java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at
>
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at
>
java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>         at
>
java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>         - locked <0xaaa02f30> (a
> java.io.BufferedInputStream)
>         at
>
java.io.DataInputStream.readFully(DataInputStream.java:176)
>         at
>
java.io.DataInputStream.readLong(DataInputStream.java:380)
>         at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.jav
> a:615)
>         - locked <0xaaa02f98> (a
> org.apache.hadoop.dfs.DFSClient$DFSInputStream)
>         at
>
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:702)
>         - locked <0xaaa02f98> (a
> org.apache.hadoop.dfs.DFSClient$DFSInputStream)
>         at
>
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStr
> eam.java:189)
>         at
>
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at
>
java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>         at
>
java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>         - locked <0xaaa03040> (a
> org.apache.hadoop.fs.FSDataInputStream$Buffer)
>         at
>
java.io.DataInputStream.readFully(DataInputStream.java:176)
>         at
>
java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at
>
org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream.
> java:60)
>         at
>
org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:279
> )
>         at
>
org.apache.hadoop.fs.FileSystem.open(FileSystem.java:262)
>         at
>
com.collarity.cdata.CDataSegmentReader.openDataStream(CDataSegmentReader
> .java:132)
>         at
>
com.collarity.cdata.CDataSegmentReader.getDataStream(CDataSegmentReader.
> java:118)
>         - locked <0x1bd29208> (a
> com.collarity.cdata.CDataSegmentReader)
>         at
>
com.collarity.cdata.CDataSegmentReader.readDocument(CDataSegmentReader.j
> ava:95)
> ...
> 
> Unfortunately - as it happened on production system
> - I really didn't
> have any time to research it further.
> Once I stopped mapreduce (Job tracker, task
> trackers) and killed all
> outstanding tasks - the Jetty got back to normal.
> 
> The only suspicious line in the NameNode logs around
> that timeframe was:
> 
> 
> 2007-02-24 09:25:58,792 INFO
>
[org.apache.hadoop.dfs.FSNamesystem$HeartbeatMonitor@24a4e2e3]
> []
> StateChange          : BLOCK*
> NameSystem.heartbeatCheck: lost heartbeat
> from sf3-1:50010
> 
> Well, if/when it happens again - I'll try to
> investigate it further.
> 
> Igor
> 
> P.S. We are using Hadoop version 0.10.1. I guess
> first thing for us to
> try would be to to upgrade to the latest version.
> 
> 
> -----Original Message-----
> From: Venkat Seeth [mailto:svejb@yahoo.com] 
> Sent: Saturday, February 24, 2007 9:32 AM
> To: hadoop-user@lucene.apache.org
> Subject: Reduce hangs at times
> 
> Hi there,
> 
> Howdy. I observe at times that few of the reduce
> tasks hangs during copy
> phase and does not result in failures also. Hence
> these tasks never
> complete nor rerun for timeouts.
> 
> reduce > copy (1510 of 1540 at 1.57 MB/s) >
> 
> At the same time, I see that Jetty is out of threads
> in its thread pool.
> Dont know if these 2 are related.
> 
> I also see the following exception for many of the
> MR operations.
> 
> 24 Feb 2007 02:52:14,438  WARN - No thread for
>
Socket[addr=/10.163.63.137,port=56019,localport=50060]
> - at
> org.mortbay.util.ThreadPool.run(ThreadPool.java:373)
> 24 Feb 2007 02:52:24,440  WARN - No thread for
>
Socket[addr=/10.163.63.137,port=56023,localport=50060]
> - at
> org.mortbay.util.ThreadPool.run(ThreadPool.java:373)
> 24 Feb 2007 02:53:01,582  WARN -
> getMapOutput(task_0005_m_000432_0,30) failed :
> java.net.SocketException: Connection reset
>         at
>
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>         at
>
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at
>
org.mortbay.http.ChunkingOutputStream.bypassWrite(ChunkingOutputStream.j
> ava:151)
>         at
>
org.mortbay.http.BufferedOutputStream.write(BufferedOutputStream.java:13
> 9)
>         at
>
org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423)
>         at
>
org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54)
>         at
>
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.
> java:1574)
>         at
>
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>         at
>
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>         at
>
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
>         at
>
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH
> andler.java:475)
>         at
>
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
>         at
>
org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>         at
>
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon
> text.java:635)
>         at
>
org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>         at
>
org.mortbay.http.HttpServer.service(HttpServer.java:954)
>         at
>
org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>         at
>
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>         at
>
org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>         at
>
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244
> )
>         at
>
org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>         at
>
org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
>  - at
>
org.apache.hadoop.mapred.TaskTracker.doGet(TaskTracker.java:1600)
> 24 Feb 2007 02:53:01,583  WARN -
> getMapOutput(task_0005_m_001465_0,55) failed :
> 
> Has anyone experienced this? Any thoughts are
> greatly appreciated.
> 
> Thanks,
> Venkat
> 
> 
> 
>  
>
________________________________________________________________________
> ____________
> 
=== message truncated ===



 
____________________________________________________________________________________
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.
http://mobile.yahoo.com/mail 

Mime
View raw message