hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel Daley <nda...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-600) Race condition in JobTracker updating the task tracker's status while declaring it lost
Date Mon, 08 Jan 2007 19:09:18 GMT
The patch build failed because 2 tests, TestReplication and  
TestRestartDFS, failed on RHEL 4.  I see that both test logs contain  
these exceptions:

TestReplication:
     [junit] Data node crashed:
     [junit] java.lang.NullPointerException
     [junit] 	at org.apache.hadoop.ipc.Client$Connection.sendParam 
(Client.java:304)
     [junit] 	at org.apache.hadoop.ipc.Client.call(Client.java:455)
     [junit] 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
     [junit] 	at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion 
(Unknown Source)
     [junit] 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:248)
     [junit] 	at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:227)
     [junit] 	at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java: 
225)
     [junit] 	at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java: 
171)
     [junit] 	at org.apache.hadoop.dfs.MiniDFSCluster 
$DataNodeRunner.run(MiniDFSCluster.java:118)
     [junit] 	at java.lang.Thread.run(Thread.java:595)
     [junit] 2007-01-08 18:33:13,436 INFO  ipc.Client (Client.java:run 
(279)) - java.lang.NullPointerException
     [junit] 	at org.apache.hadoop.ipc.Client$Connection.run 
(Client.java:247)

and TestRestartDFS:
     [junit] Data node crashed:
     [junit] java.lang.NullPointerException
     [junit] 	at org.apache.hadoop.ipc.Client$Connection.sendParam 
(Client.java:304)
     [junit] 	at org.apache.hadoop.ipc.Client.call(Client.java:455)
     [junit] 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
     [junit] 	at org.apache.hadoop.dfs.$Proxy0.register(Unknown Source)
     [junit] 	at org.apache.hadoop.dfs.DataNode.register 
(DataNode.java:295)
     [junit] 	at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java: 
183)
     [junit] 	at org.apache.hadoop.dfs.MiniDFSCluster 
$DataNodeRunner.run(MiniDFSCluster.java:118)
     [junit] 	at java.lang.Thread.run(Thread.java:595)
     [junit] 2007-01-08 18:35:21,223 INFO  util.ThreadedServer  
(ThreadedServer.java:run(656)) - Stopping Acceptor ServerSocket 
[addr=0.0.0.0/0.0.0.0,port=0,localport=50092]
     [junit] 2007-01-08 18:35:21,223 INFO  ipc.Client (Client.java:run 
(279)) - java.lang.NullPointerException

(Yes, the 0 build attempts is a script error.  There was 1 build  
attempt.).

I'm unsure how reproducible these are.

Nige

On Jan 8, 2007, at 10:49 AM, Hadoop QA (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/HADOOP-600? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12463097 ]
>
> Hadoop QA commented on HADOOP-600:
> ----------------------------------
>
> -1, because 0 attempts failed to build and test the latest  
> attachment (http://issues.apache.org/jira/secure/attachment/ 
> 12348510/HADOOP-600_20070108_1.patch) against trunk revision  
> r494137. Please note that this message is automatically generated  
> and may represent a problem with the automation system and not the  
> patch.
>
>> Race condition in JobTracker updating the task tracker's status  
>> while declaring it lost
>> --------------------------------------------------------------------- 
>> ------------------
>>
>>                 Key: HADOOP-600
>>                 URL: https://issues.apache.org/jira/browse/HADOOP-600
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: mapred
>>    Affects Versions: 0.7.1
>>            Reporter: Owen O'Malley
>>         Assigned To: Arun C Murthy
>>             Fix For: 0.10.1
>>
>>         Attachments: HADOOP-600_20070108_1.patch
>>
>>
>> There was a case where the JobTracker lost track of a set of tasks  
>> that were on a task tracker. It appears to be a race condition  
>> because the ExpireTrackers thread doesn't lock the JobTracker  
>> while updating the state. The fix would be to build a list of dead  
>> task trackers and then lock the job tracker while updating their  
>> status.
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators: https://issues.apache.org/jira/secure/ 
> Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/ 
> software/jira
>
>


Mime
View raw message