hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karam Singh (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3531) Hod does not report job tracker failure on hod client side when job tracker fails to come up
Date Mon, 16 Jun 2008 07:37:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605212#action_12605212
] 

Karam Singh commented on HADOOP-3531:
-------------------------------------

To verify this issue did the following -:
1. Tried a scenario where --gridservice-mapred.pkgs and --gridservice-hdfs.pkgs paths correct
on three nodes and max-master-failure=12. Tried successfully hod allocation with 15 nodes
three times and monitored the ringmaster log-:
    a. namenode came up  in 2nd retry. jobtracker came in 4 retry after 3 failures.
    b. namenode came up in 9th retry  after 8 failures. jobtracker came in 1st try.
    c. namenode came up in first try. Jobtracker came up in 3 retry after 2 failures.
2. Tried a scenario where --gridservice-mapred.pkgs path correct on two nodes and max-master-failure=13
using static dfs. Tried successfully hod allocation with 15 nodes 4 times and monitored the
ringmaster log-: jobtracker came in first try for 3 allocations. In 4th allocation jobtracker
came up in 8th retry after 7 failures.
3. Tried a scenario where --hodring.java-home correct only on ringmaster, with max-failures=12.
namenode came up on ringmaster node. All other 14 hodrings failed to start with "Invalid --hodring.java-home"
error (observed from ringmaster log). ringmaster waited 2 mins for mapred before giving up
3, Tried a scenario where --hodring.java-home correct on 3 nodes , with max-failures=12. Tried
hod allocate 15 nodes. namenode came up on ringmaster node.12 hodrings failed with invalid
--hodring.java-home error.  jobtracker, dn and tt came up on remaining two nodes
    
Also tried some negative test with max-failures= 2-:
1. Provided wrong --hodring.pkgs. Verified that hod allocation fails as ringmaster failed
with proper error message.
2. Provided wrong path for --gridservice-mapred.pkgs and --gridservice-hdfs.pkgs. Verified
that proper error message from ringmaster log displayed at hod client side. Also tried with
invalid tarball
3. Tried a scenario with --gridservice-mapred.pkgs and --gridservice-hdfs.pkgs path correct
only ringmaster node with max-master failures =2
    Tried two times -:
    a. hod allocation failed as jobtracker failed to start with proper error message and ringmaster
log also showing -:Detected errors (3) beyond allowed number of failures (2). Flagging error
to client
   b. hod allocation failed as namenode  failed to start with proper error message and ringmaster
log also showing -:Detected errors (3) beyond allowed number of failures (2). Flagging error
to client


> Hod does not  report job tracker failure on hod client side when job tracker fails to
come up
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3531
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>    Affects Versions: 0.18.0
>            Reporter: Karam Singh
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: 3531.patch
>
>
> Hod does not  report job tracker failure on hod client side when job tracker fails to
come up. 
> When max-master-failure > 1
> hod client does not properly show why job tracker failed to come up, while in case namenode
proper error message is displayed.
> Also in namenode failure ringmaster log contains information such as -: "Detected errors
(3) beyond allowed number of failures (2). Flagging error to client"
> while no such information is there in ringmaster log for job tracker failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message