hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-754) NPE in expiry thread when a TT is lost
Date Fri, 13 Nov 2009 10:20:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777434#action_12777434
] 

Hemanth Yamijala commented on MAPREDUCE-754:
--------------------------------------------

Some more comments:

- It would be useful to add a javadoc for getNumberOfUniqueHosts, along with a reason why
blacklisted hosts must be excluded from this count. Please remember our offline discussion
where we spoke about why this number must be excluded when scheduling.

Other comments on test cases:
- In TestLostTracker, please separate case 3 into a separate test case. It is generally good
unit testing practice to test separate conditions in separate tests.
- We can assert some state after case 2 and case 3 in addition to just making sure method
calls succeed. For e.g. in the case of blacklisting, we can check the number of active hosts
is decremented by the right value (because we are changing that API as well and will be a
good check). Likewise we can also check that a host is blacklisted or a job is finished, etc.
- Please create the hosts.exclude file in a folder relative to TEST_DIR.
- The testcase testBlacklistedNodeDecommissioning can blacklist a node by globally blacklisting
- rather than the health check script, which is slightly more complicated. One reason for
doing so is that we can do this without having to wait for blacklisting to happen asynchronously.
- Common code in this class related to global blacklisting because of job failures as well
as refresh of hosts can also be refactored into separate utility methods and reused.
- Instead of checking if the decommissioned tracker is not present in the list of trackers,
since we are starting with only one tracker, we can explicitly check that the number of trackers
in jt.taskTrackers is 0.
- Some additional tests that I can suggest
-- Blacklist + decommission when there are multiple trackers per host
-- Have a cluster with 3 trackers, blacklist one of them, decommission 2 of them, and make
sure the active, decommissioned and blacklisted counts all match.

> NPE in expiry thread when a TT is lost
> --------------------------------------
>
>                 Key: MAPREDUCE-754
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-754
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.1
>            Reporter: Ramya R
>            Assignee: Amar Kamat
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-754-v1.1.patch, mapreduce-754-v1.2.patch, mapreduce-754-wip.patch
>
>
> NullPointerException is obtained in Tracker Expiry Thread. Below is the exception obtained
in the JT logs 
> {noformat}
> ERROR org.apache.hadoop.mapred.JobTracker: Tracker Expiry Thread got exception: java.lang.NullPointerException
>         at org.apache.hadoop.mapred.JobTracker.updateTaskTrackerStatus(JobTracker.java:2971)
>         at org.apache.hadoop.mapred.JobTracker.access$300(JobTracker.java:104)
>         at org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:381)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}
> The steps to reproduce this issue are:
> * Blacklist a TT. 
> * Restart it. 
> * The above exception is obtained when the first instance of TT is marked as lost.
> However the above exception does not break any functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message