hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-16) RPC call times out while indexing map task is computing splits
Date Wed, 22 Feb 2006 21:34:13 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-16?page=comments#action_12367417 ] 

Doug Cutting commented on HADOOP-16:
------------------------------------

Eric Baldeschwieler wrote:
> [...] why not just dedicate a thread  
> to planning and then load a complete plan?  That can produce more  
> optimal placement and a simpler to understand initialization sequence.

A separate thread in the JobTracker?  That could be a good approach.  We'd have a queue of
submitted but as-yet unplanned jobs.  The thread can then pop a job off the queue, compute
its splits, then start populating a tasktracker->split table.  When tasktrackers poll for
work they can consult this table, potentially while the thread is still populating it.

I'm hesitant to move this out of the JobTracker into the TaskTracker, since that introduces
complexity.  But a single thread in the JobTracker should be simple to add and should mostly
solve this.  +1

> RPC call times out while indexing map task is computing splits
> --------------------------------------------------------------
>
>          Key: HADOOP-16
>          URL: http://issues.apache.org/jira/browse/HADOOP-16
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>  Environment: MapReduce multi-computer crawl environment: 11 machines (1 master with
JobTracker/NameNode, 10 slaves with TaskTrackers/DataNodes)
>     Reporter: Chris Schneider
>     Assignee: Mike Cafarella
>      Fix For: 0.1
>  Attachments: patch.16
>
> We've been using Nutch 0.8 (MapReduce) to perform some internet crawling. Things seemed
to be going well until...
> 060129 222409 Lost tracker 'tracker_56288'
> 060129 222409 Task 'task_m_10gs5f' has been lost.
> 060129 222409 Task 'task_m_10qhzr' has been lost.
>    ........
>    ........
> 060129 222409 Task 'task_r_zggbwu' has been lost.
> 060129 222409 Task 'task_r_zh8dao' has been lost.
> 060129 222455 Server handler 8 on 8010 caught: java.net.SocketException: Socket closed
> java.net.SocketException: Socket closed
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at org.apache.nutch.ipc.Server$Handler.run(Server.java:216)
> 060129 222455 Adding task 'task_m_cia5po' to set for tracker 'tracker_56288'
> 060129 223711 Adding task 'task_m_ffv59i' to set for tracker 'tracker_25647'
> I'm hoping that someone could explain why task_m_cia5po got added to tracker_56288 after
this tracker was lost.
> The Crawl .main process died with the following output:
> 060129 221129 Indexer: adding segment: /user/crawler/crawl-20060129091444/segments/20060129200246
> Exception in thread "main" java.io.IOException: timed out waiting for response
>     at org.apache.nutch.ipc.Client.call(Client.java:296)
>     at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>     at $Proxy1.submitJob(Unknown Source)
>     at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
>     at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
>     at org.apache.nutch.indexer.Indexer.index(Indexer.java:263)
>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:127)
> However, it definitely seems as if the JobTracker is still waiting for the job to finish
(no failed jobs).
> Doug Cutting's response:
> The bug here is that the RPC call times out while the map task is computing splits. 
The fix is that the job tracker should not compute splits until after it has returned from
the submitJob RPC.  Please submit a bug in Jira to help remind us to fix this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message