hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7056) Snapshot support for truncate
Date Tue, 04 Nov 2014 19:30:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196655#comment-14196655

Jing Zhao commented on HDFS-7056:

Thanks for working on this, [~shv] and [~zero45]. So far I just went through the namenode
snapshot part (INode, INodeFile, FileDiff, and FileDiffList) and I will continue reviewing
the remaining.
# Looks like {{findLaterSnapshotWithBlocks}} and {{findEarlierSnapshotWithBlocks}} are always
coupled with {{FileDiff#getBlocks}}. Maybe we can combine them so that we can wrap the logic
like the following code into two methods like findBlocksAfter and findBlocksBefore?
+    FileDiff diff = getDiffs().getDiffById(snapshot);
+    BlockInfo[] snapshotBlocks = diff == null ? getBlocks() : diff.getBlocks();
+    if(snapshotBlocks != null)
+      return snapshotBlocks;
+    // Blocks are not in the current snapshot
+    // Find next snapshot with blocks present or return current file blocks
+    diff = getDiffs().findLaterSnapshotWithBlocks(diff.getSnapshotId());
+    snapshotBlocks = (diff == null) ? getBlocks() : diff.getBlocks();
# Since the same block can be included in different file diffs, we may have duplicated blocks
in the {{collectedBlocks}}. Will this lead to duplicated records in invalid block list?
  public void destroyAndCollectSnapshotBlocks(
      BlocksMapUpdateInfo collectedBlocks) {
    for(FileDiff d : asList())
# INodeFile#destroyAndCollectBlocks destroys the whole file, including the file diffs for
snapshots. Thus we do not need to call {{collectBlocksAndClear}} and define a new destroyAndCollectAllBlocks
method. Instead, we can simply first destroy all the blocks belonging to the current file,
then check if calling {{sf.getDiffs().destroyAndCollectSnapshotBlocks}} is necessary.
+    FileWithSnapshotFeature sf = getFileWithSnapshotFeature();
+    if(sf == null || getDiffs().asList().isEmpty()) {
+      destroyAndCollectAllBlocks(collectedBlocks, removedINodes);
+      return;
+    }
+    sf.getDiffs().destroyAndCollectSnapshotBlocks(collectedBlocks);
# How do we currently calculate/update quota for a file? I guess we need to update the quota
calculation algorithm for an INodeFile here.
# I guess the semantic of {{findEarlierSnapshotWithBlocks}} is to find the FileDiff that satisfies:
1) its block list is not null, and 2) its snapshot id is less than the given {{snapshotId}}.
However, if the given {{snapshotId}} is not {{CURRENT_STATE_ID}}, the current implementation
may return a FileDiff whose snapshot id is >= the given {{snapshotId}} (since {{getDiffById}}
may return a diff with snapshot id greater than the given id).
  public FileDiff findEarlierSnapshotWithBlocks(int snapshotId) {
    FileDiff diff = (snapshotId == Snapshot.CURRENT_STATE_ID) ?
        getLast() : getDiffById(snapshotId);
    BlockInfo[] snapshotBlocks = null;
    while(diff != null) {
      snapshotBlocks = diff.getBlocks();
      if(snapshotBlocks != null)
      int p = getPrior(diff.getSnapshotId(), true);
      diff = (p == Snapshot.NO_SNAPSHOT_ID) ? null : getDiffById(p);
    return diff;
# Still for findEarlierSnapshotWithBlocks, because {{getPrior}} currently is a {{log\(n\)}}
operation, the worst time complexity thus can be {{nlog\(n\)}}. Considering the list of the
snapshot diff list is usually not big (we have an upper limit for the total number of snapshots),
we may consider directly doing a linear scan for the file diff list.
# In INode.java, why do we need the following change?
   public final boolean isInLatestSnapshot(final int latestSnapshotId) {
-    if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) {
+    if (latestSnapshotId == Snapshot.CURRENT_STATE_ID ||
+        latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) {
# Nit: need to add "\{" and "\}" for the while loop according to our current coding style.
Similar for several other places (e.g., {{FileDiffList#destroyAndCollectSnapshotBlocks}}).
+    while(i < currentBlocks.length && i < snapshotBlocks.length &&
+          currentBlocks[i] == snapshotBlocks[i])
+      i++;
+    // Collect the remaining blocks of the file
+    while(i < currentBlocks.length)
+      collectedBlocks.addDeleteBlock(currentBlocks[i++]);
# Minor: In the following code, instead of calling {{getDiffById}} to search for the file
diff, we can let {{AbstractINodeDiffList#saveSelf2Snapshot}} return the diff it just finds/creates.
  public void saveSelf2Snapshot(int latestSnapshotId, INodeFile iNodeFile,
      INodeFileAttributes snapshotCopy, boolean withBlocks)
          throws QuotaExceededException {
    super.saveSelf2Snapshot(latestSnapshotId, iNodeFile, snapshotCopy);
    if(! withBlocks) return;
    final FileDiff diff = getDiffById(latestSnapshotId);
    // Store blocks if this is the first update
# Nit: also, for the above code, maybe it's more direct to have
if (withBlocks) {
  final FileDiff diff = getDiffById(latestSnapshotId);
  // Store blocks if this is the first update

> Snapshot support for truncate
> -----------------------------
>                 Key: HDFS-7056
>                 URL: https://issues.apache.org/jira/browse/HDFS-7056
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx
> Implementation of truncate in HDFS-3107 does not allow truncating files which are in
a snapshot. It is desirable to be able to truncate and still keep the old file state of the
file in the snapshot.

This message was sent by Atlassian JIRA

View raw message