hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-15660) Printing extra refs in ITBLL.Verify reducer can cause reducers to be killed due to lack of progress
Date Tue, 19 Apr 2016 03:40:25 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Elser resolved HBASE-15660.
--------------------------------
       Resolution: Invalid
         Assignee:     (was: Josh Elser)
    Fix Version/s:     (was: 1.2.2)
                       (was: 1.1.5)
                       (was: 2.0.0)

Nope, I'm wrong. The message was just the general reducer timeout, not because of lack of
progress. Fine as is.

> Printing extra refs in ITBLL.Verify reducer can cause reducers to be killed due to lack
of progress
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15660
>                 URL: https://issues.apache.org/jira/browse/HBASE-15660
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>            Reporter: Josh Elser
>            Priority: Minor
>
> In debugging an ITBLL job which has numerous failures, I saw that instead of the Verify
job completing and reporting that there were a large number of UNDEF nodes, the reducers in
the Verify were failing due to lack of progress.
> The reducer's syslog file was filled with information from the {{dumpExtraInfoOnRefs()}}
method. I believe that when a reducer is repeatedly doing these lookups, the MR framework
doesn't realize that any progress is being made (nothing is being written to the context)
and eventually kills the reducer task. This ultimately causes the entire Verify job to fail
because the reducer fails in the same manner each time.
> We should make sure to invoke {{context.progress()}} when we do these lookups to let
the framework know that we're still doing "our thing".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message