hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gangumalla, Uma" <uma.ganguma...@intel.com>
Subject [DISCUSS] Merge Storage Policy Satisfier (SPS) [HDFS-10285] feature branch to trunk
Date Tue, 25 Jul 2017 06:35:14 GMT
Dear All,

I would like to propose Storage Policy Satisfier(SPS) feature merge into trunk. We have been
working on this feature from last several months. This feature received the contributions
from different companies. All of the feature development happened smoothly and collaboratively
in JIRAs.

Detailed design document is available in JIRA: Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf<https://issues.apache.org/jira/secure/attachment/12873642/Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf>
Test report attached to JIRA: HDFS-SPS-TestReport-20170708.pdf<https://issues.apache.org/jira/secure/attachment/12876256/HDFS-SPS-TestReport-20170708.pdf>

Short Description of the feature:-
   Storage Policy Satisfier feature is to aim the distributed HDFS applications to schedule
the block movements easily.
   When storage policy change happened, user can invoke the satisfyStoragePolicy api to trigger
the block storage movements.
   Block movement tasks will be assigned to datanodes and movements will happen distributed
   Block level movement tracking also has been distributed to Dns to avoid the load on Namenodes.
   A co-ordinator Datanode tracks all the blocks associated to a blockCollection and send
the consolidated final results to Namenode.
   If movement result is failure, Namenode will re-schedule the block movements.

Development branch is: HDFS-10285
No of JIRAs Resolved: 38
Pending JIRAs: 4 (I don’t think they are blockers for merge)

We have posted combined patch for easy merge reviews. Jenkins job test results looking good
on the combined patch.
Quick stats on combined Patch:
  67 files changed, 7001 insertions(+), 45 deletions(-)
  Added/modified testcases= ~70

Thanks to all helpers namely Andrew Wang, Anoop Sam John, Du Jingcheng , Ewan Higgs, Jing
Zhao, Kai Zheng,  Rakesh R, Ramakrishna , Surendra Singh Lilhore , Uma Maheswara Rao G, Wei
Zhou , Yuanbo Liu. Without these members effort, this feature might not have reached to this

We will continue work on the following future work items:

  1.  Presently user has to do set & satisfy policy in separate RPC calls. The idea is
to provide a hybrid API dfs#setStoragePolicy(src, policy) which should do set and satisfy
in one RPC call to namenode (Reference HDFS-11669)
  2.  Presently BlockStorageMovementCommand sends all the blocks under a trackID over single
heartbeat response. If blocks are many under a given trackID (For example: a file contains
many blocks) then that bulk information goes to DN in a single network call and come with
a lot of overhead. One idea is to Use smaller batches of BlockMovingInfo into the block storage
movement command (Reference HDFS-11125)
  3.  Build a mechanism to throttle the number of concurrent moves at the datanode.
  4.  Allow specifying initial delay in seconds before the source file is scheduled for satisfying
the storage policy. For example in HBase, the interval between archive (move files between
different storages) and delete file is not large. In that case it may not be required to immediately
scheduling satisfy policy task.
  5.  SPS related metrics to be covered.

So, I feel this branch is ready for merge into trunk. Please provide your feedbacks. If there
are no objections, I will proceed for voting.

Uma & Rakesh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message