hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Appy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16775) Flakey test with TestExportSnapshot#testExportRetry and TestMobExportSnapshot#testExportRetry
Date Sat, 25 Mar 2017 02:33:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941526#comment-15941526
] 

Appy commented on HBASE-16775:
------------------------------

Things tried so far:
Dumped the configuration to logs to make sure that MR job is correctly getting mapreduce.map.maxattempts.

Changed the code to use MiniMapReduce cluster. By default it spawns 2 servers. 
Looking at minicluster logs, i see retries happening.

At this point i can't think of a way to make it work. Summarizing everything:
What this test was trying to test is: if mapper fails and we have retries enabled, then overall
job should pass.
To do so, earlier it was throwing exception from mapper based on probability, which is crazy
and highly flaky.
What i was trying to do is, set retries to Y and throw exceptions X times where X  < Y.
Initially, X is 0 and is incremented on every injected failure. The issue is, since mapper
runs are isolated, i can't find a way to maintain state of X across mappers. As a result,
even the 4th retry of mapper will see X= 0 initially.

Now am thinking that my initial line of thought (in [this|https://issues.apache.org/jira/browse/HBASE-16775?focusedCommentId=15553215&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15553215]
 comment above) was right, this test is testing internals of mapreduce i.e. if mapreduce.map.maxattempts
is set, MR framework should retry.
[~huaxiang], [~jmhsieh].

> Flakey test with TestExportSnapshot#testExportRetry and TestMobExportSnapshot#testExportRetry

> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-16775
>                 URL: https://issues.apache.org/jira/browse/HBASE-16775
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>         Attachments: disable.patch, HBASE-16775.master.001.patch, HBASE-16775.master.002.patch,
HBASE-16775.master.003.patch
>
>
> The root cause is that conf.setInt("mapreduce.map.maxattempts", 10) is not taken by the
mapper job, so the retry is actually 0. Debugging to see why this is the case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message