hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14314) fullBlockReportLeaseId should be reset after registering to NN
Date Tue, 26 Feb 2019 13:17:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777899#comment-16777899
] 

Hadoop QA commented on HDFS-14314:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 34s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m  9s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 58s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 51s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  3s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 10s{color}
| {color:green} branch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 10s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 51s{color} |
{color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 50s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 51s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 51s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 47s{color}
| {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 52 unchanged
- 0 fixed = 64 total (was 52) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 53s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 21s{color} | {color:red}
patch has errors when building and testing our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 32s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 45s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 52s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 23s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 43s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14314 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12960171/HDFS-14314-trunk.001.patch
|
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit
 shadedclient  findbugs  checkstyle  |
| uname | Linux 24b01f282ab8 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 59ba355 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
|
| compile | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt
|
| javac | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt
|
| checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
|
| mvnsite | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt
|
| findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
|
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/testReport/ |
| Max. process+thread count | 306 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26328/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> fullBlockReportLeaseId should be reset after registering to NN
> --------------------------------------------------------------
>
>                 Key: HDFS-14314
>                 URL: https://issues.apache.org/jira/browse/HDFS-14314
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.8.4
>         Environment:  
>  
>  
>            Reporter: star
>            Priority: Critical
>             Fix For: 2.8.4
>
>         Attachments: HDFS-14314-trunk.001.patch, HDFS-14314-trunk.001.patch, HDFS-14314.0.patch,
HDFS-14314.2.patch, HDFS-14314.patch
>
>
>       since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full block
lease id from active NN before sending full block to NN. Then DN will send full block report
together with lease id. If the lease id is invalid, NN will reject the full block report and
log "not in the pending set".
>       In a case when DN is doing full block reporting while NN is restarted. It
happens that DN will later send a full block report with lease id ,acquired from previous
NN instance, which is invalid to the new NN instance. Though DN recognized the new NN instance
by heartbeat and reregister itself, it did not reset the lease id from previous instance.
>       The issuse may cause DNs to temporarily go dead, making it unsafe to restart
NN especially in hadoop cluster which has large amount of DNs. HDFS-12914 reported the issue 
without any clues why it occurred and remain unsolved.
>        To make it clear, look at code below. We take it from method offerService
of class BPServiceActor. We eliminate some code to focus on current issue. fullBlockReportLeaseId
is a local variable to hold lease id from NN. Exceptions will occur at blockReport call when
NN restarting, which will be caught by catch block in while loop. Thus fullBlockReportLeaseId
will not be set to 0. After NN restarted, DN will send full block report which will be rejected
by the new NN instance. DN will never send full block report until the next full block report
schedule, about an hour later.
>       Solution is simple, just reset fullBlockReportLeaseId to 0 after any exception
or after registering to NN. Thus it will ask for a valid fullBlockReportLeaseId from new NN
instance.
> {code:java}
> private void offerService() throws Exception {
>   long fullBlockReportLeaseId = 0;
>   //
>   // Now loop for a long time....
>   //
>   while (shouldRun()) {
>     try {
>       final long startTime = scheduler.monotonicNow();
>       //
>       // Every so often, send heartbeat or block-report
>       //
>       final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime);
>       HeartbeatResponse resp = null;
>       if (sendHeartbeat) {
>       
>         boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) &&
>                 scheduler.isBlockReportDue(startTime);
>         scheduler.scheduleNextHeartbeat();
>         if (!dn.areHeartbeatsDisabledForTests()) {
>           resp = sendHeartBeat(requestBlockReportLease);
>           assert resp != null;
>           if (resp.getFullBlockReportLeaseId() != 0) {
>             if (fullBlockReportLeaseId != 0) {
>               LOG.warn(nnAddr + " sent back a full block report lease " +
>                       "ID of 0x" +
>                       Long.toHexString(resp.getFullBlockReportLeaseId()) +
>                       ", but we already have a lease ID of 0x" +
>                       Long.toHexString(fullBlockReportLeaseId) + ". " +
>                       "Overwriting old lease ID.");
>             }
>             fullBlockReportLeaseId = resp.getFullBlockReportLeaseId();
>           }
>          
>         }
>       }
>    
>      
>       if ((fullBlockReportLeaseId != 0) || forceFullBr) {
>         //Exception occurred here when NN restarting
>         cmds = blockReport(fullBlockReportLeaseId);
>         fullBlockReportLeaseId = 0;
>       }
>       
>     } catch(RemoteException re) {
>       
>   } // while (shouldRun())
> } // offerService{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message