hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
Date Wed, 05 Oct 2016 17:07:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549352#comment-15549352
] 

Xiao Chen commented on HDFS-10797:
----------------------------------

Thanks Sean. Agreed correctness is the goal. My comment regarding performance was on the nature
of the context based summary, which is added to prevent a {{du}} locking the NN. And as we
both agreed, this patch doesn't touch that.

Also about {{getIncludedNodes}}, it seems we don't call it anywhere... Maybe we can remove
the getter method, and when a use case emerges, let the new change handle it?

> Disk usage summary of snapshots causes renamed blocks to get counted twice
> --------------------------------------------------------------------------
>
>                 Key: HDFS-10797
>                 URL: https://issues.apache.org/jira/browse/HDFS-10797
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, HDFS-10797.003.patch,
HDFS-10797.004.patch, HDFS-10797.005.patch, HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch,
HDFS-10797.009.patch
>
>
> DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how much disk
usage is used by a snapshot by tallying up the files in the snapshot that have since been
deleted (that way it won't overlap with regular files whose disk usage is computed separately).
However that is determined from a diff that shows moved (to Trash or otherwise) or renamed
files as a deletion and a creation operation that may overlap with the list of blocks. Only
the deletion operation is taken into consideration, and this causes those blocks to get represented
twice in the disk usage tallying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message