hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
Date Thu, 29 Sep 2016 01:01:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531432#comment-15531432
] 

Jing Zhao commented on HDFS-10797:
----------------------------------

[~mackrorysd], I agree it will be great to have a consistent and user-friendly semantic. To
me a better semantic can be like this: if the renamed source (which is inside of some snapshot)
and the renamed target are both under the same directory for counting, we count them once.
Otherwise they will be counted separately.

With this semantic maybe we only need to move your hashset to the context object passed from
the beginning of the counting call, and use it to avoid duplicated counting. What do you think?

> Disk usage summary of snapshots causes renamed blocks to get counted twice
> --------------------------------------------------------------------------
>
>                 Key: HDFS-10797
>                 URL: https://issues.apache.org/jira/browse/HDFS-10797
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, HDFS-10797.003.patch
>
>
> DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how much disk
usage is used by a snapshot by tallying up the files in the snapshot that have since been
deleted (that way it won't overlap with regular files whose disk usage is computed separately).
However that is determined from a diff that shows moved (to Trash or otherwise) or renamed
files as a deletion and a creation operation that may overlap with the list of blocks. Only
the deletion operation is taken into consideration, and this causes those blocks to get represented
twice in the disk usage tallying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message