hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs
Date Tue, 27 Feb 2018 22:22:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379394#comment-16379394
] 

Tsz Wo Nicholas Sze commented on HDFS-13102:
--------------------------------------------

Thanks for the update.  Some comments on the 007 patch:
- In SkipListNode, similar to getSkipNode, setSkipDiff and setSkipTo should take care about
resize so the callers don't have to.  We should also remove the addSkipDiff method.
{code}
    public void setSkipDiff(ChildrenDiff cDiff, int level) {
      if (level < skipDiffList.size()) {
        skipDiffList.get(level).setDiff(cDiff);
      } else {
        skipDiffList.add(new SkipDiff(null, cDiff));
      }
    }

    public void setSkipTo(SkipListNode node, int level) {
      for (int i = skipDiffList.size(); i <= level; i++) {
        skipDiffList.add(null);
      }
      skipDiffList.get(level).setSkipTo(node);
    }
{code}

- In addFirst, there are two "if (combined != null)".

- We should not directly use ID_INTEGER_COMPARATOR in DirectoryDiffList.
-* In SkipListNode, change compareTo to
{code}
    public final int compareTo(final Integer that) {
      return diff.compareTo(that);
    }
{code}
-* In getMinListForRange, use
{code}
next.getDiff().compareTo(toSnapshotId) > 0)
{code}


- In getMinListForRange, is it never level < 0 unless the list is inconsistent?  So that
we can clean up the code as follow.
{code}
  @Override
  public List<DirectoryDiff> getMinListForRange(
      int fromIndex, int toIndex, INodeDirectory dir) {
    final List<DirectoryDiff> subList = new ArrayList<>();
    final DirectoryDiff toDiff = get(toIndex);
    final int toSnapshotId = toDiff.getSnapshotId();

    for(SkipListNode current = getNode(fromIndex); current != null; ) {
      int level = current.level();
      SkipListNode next = current.getSkipNode(level);
      while (next == null || next.getDiff().compareTo(toSnapshotId) > 0) {
        level--;
        next = current.getSkipNode(level);
      }

      DirectoryDiff diff = new DirectoryDiff(current.getDiff().getSnapshotId(),
          dir, current.getChildrenDiff(level));
      subList.add(diff);
      current = next;
    }
    return subList;
  }
}
{code}


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> --------------------------------------------------------------------
>
>                 Key: HDFS-13102
>                 URL: https://issues.apache.org/jira/browse/HDFS-13102
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Major
>         Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, HDFS-13102.003.patch,
HDFS-13102.004.patch, HDFS-13102.005.patch, HDFS-13102.006.patch, HDFS-13102.007.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a very long time
in case the no of snapshot diffs is quite large for directories. For any directory under a
snapshot, to construct the children list , it needs to combine all the diffs from that particular
snapshot to the last snapshotDiff record and reverseApply to the current children list of
the directory on live fs. This can take  a significant time if the no of snapshot diffs are
quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where we store
multi level DirectoryDiffs. At each level, the Directory Diff will be cumulative diff of k
snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message