hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9521) TransferFsImage.receiveFile should account and log separate times for image download and fsync to disk
Date Mon, 07 Mar 2016 11:52:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182915#comment-15182915
] 

Harsh J commented on HDFS-9521:
-------------------------------

+1.

The check-point related tests in one of the tests seemed relevant but they pass locally on
both JDK7 and JDK8.

{code}
Running org.apache.hadoop.hdfs.TestRollingUpgrade
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 99.737 sec - in org.apache.hadoop.hdfs.TestRollingUpgrade
{code}

Therefore they appear to be flaky than at fault here. Other tests appear similarly unrelated
to the log change here (no tests appear to rely on the original message either).

Committing to branch-2 and trunk shortly.

> TransferFsImage.receiveFile should account and log separate times for image download
and fsync to disk 
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9521
>                 URL: https://issues.apache.org/jira/browse/HDFS-9521
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-9521-2.patch, HDFS-9521-3.patch, HDFS-9521.004.patch, HDFS-9521.patch,
HDFS-9521.patch.1
>
>
> Currently, TransferFsImage.receiveFile is logging total transfer time as below:
> {noformat}
> double xferSec = Math.max(
>        ((float)(Time.monotonicNow() - startTime)) / 1000.0, 0.001);    
> long xferKb = received / 1024;
> LOG.info(String.format("Transfer took %.2fs at %.2f KB/s",xferSec, xferKb / xferSec))
> {noformat}
> This is really useful, but it just measures the total method execution time, which includes
time taken to download the image and do an fsync to all the namenode metadata directories.
> Sometime when troubleshooting these imager transfer problems, it's interesting to know
which part of the process is being the bottleneck (whether network or disk write).
> This patch accounts time for image download and fsync to each disk separately, logging
how much time did it take on each operation.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message