hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
Date Thu, 07 Dec 2017 08:42:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281503#comment-16281503
] 

Rakesh R commented on HDFS-10285:
---------------------------------

Thanks a lot [~anu] for your time and comments.

bq. This is the most critical concern that I have. In one of the discussions with SPS developers,
they pointed out to me that they want to make sure an SPS move happens within a reasonable
time. Apparently, I was told that this is a requirement from HBase. If you have such a need,
then the first thing an admin will do is to increase this queue size. Slowly, but steadily
SPS will eat into more and more memory of Namenode
Increasing Namenode Q will not help to speedup the block movements. It is the Datanode who
does actual block movements and need to tune Datanode bandwidth to speedup the block movements.
Hence there is no sense in increasing Namenode Q. Infact, that will simply add up the pending
tasks at the Namenode side.

Let me try putting the memory usage of Namenode Q:
Assume there are 1 million directories and users invoked {{dfs#satisfyStoragePolicy(path)}}
API on these directories, which is a huge data movement and it may not be a regular case.
Again, assume without knowing the advantage of increasing the Q size if some unpleasant user
set the Q size to a higher value 1,000,000. Each API call will add an xattr to represent the
pending movement. Also, NN maintains list of pending dir inode id to satisfy the policy in
memory, which is the long value. Each Xattr takes 15chars {{"system.hdfs.sps"}} for the marking(Note:
in the branch code it uses {{system.hdfs.satisfy.storage.policy}}, we will shorten the chars
to {{system.hdfs.sps}}). With that, the total space occupy is (xattr + inodeId) size.

1,000,000 * (30bytes + 64bytes) = 1000,000 * 94 = 94,000,000bytes = 89.65MB = 90MB approax,
which I feel is a smaller percentage and this may occur in the misconfgured scenario where
many InodeIds queued up.

bq. We have an existing pattern Balancer, Mover, DiskBalancer where we have the "scan and
move tools" as an external feature to namenode. I am not able to see any convincing reason
for breaking this pattern.
- {{Scanning}} - For scanning, CPU is the most consumed resource. IIUC, from your previous
comments, I'm glad that you agreed that CPU is not an issue. Hence scanning is not a concern.
If we run SPS outside, it has to put additional RPC calls for the SPS work and again switching
of SPS-ha service has to blindly scan the entire namespace to figure out the xattrs. Now,
for handling the switching scenarios, we have to come up with some kind of unfair tweaking
logic like, write xattr somewhere in the file and new active SPS service should read it from
there and continue. With this, I feel to keep the scanning logic at NN. 
FYI, NN has existing feature EDEK which also does scanning and we reuses the same code in
SPS.
Also, I'm re-iterating the point that, SPS does not scan the files its own, user has to call
API to satisfy a particular file.

- {{Moving blocks}} - It is something assigning the responsibility to Datanode. Presently,
Namenode has several logic which does block movement - ReplicationMonitor, EC-Reconstruction,
Decommissioning etc. We have added throttling mechanism for the sps block movements also,
not to affect the existing data movements.

- AFAIK, DiskBalancer is completely run at the Datanode and it looks like Datanode utility.
I don't think to compare it with SPS. Coming to the Balancer, which doesn't need any input
file paths and it does balancing HDFS cluster based on the utilization. Balancer can run independently
as it doesn't take any input file path argument and user may not be waiting to finish the
balancing work, whereas SPS is exposed to the user via HSM feature. HSM is completely binds
to the Namenode, which today only allows users to set the storage policy and changing the
state at NN and NN is taking no action to satisfy the policy. For HSM feature, starting another
service may be an overhead in reality and HSM adoption may be less. My personal opinion, just
because of the Balancer/DiskBalancer running outside it is not a good reason for keeping SPS
outside.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, HDFS-10285-consolidated-merge-patch-01.patch,
HDFS-10285-consolidated-merge-patch-02.patch, HDFS-10285-consolidated-merge-patch-03.patch,
HDFS-SPS-TestReport-20170708.pdf, Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, Storage-Policy-Satisfier-in-HDFS-May10.pdf,
Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These policies
can be set on directory/file to specify the user preference, where to store the physical block.
When user set the storage policy before writing data, then the blocks could take advantage
of storage policy preferences and stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then the blocks
would have been written with default storage policy (nothing but DISK). User has to run the
‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed
system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool
as different nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage policy file
(inherited policy from parent directory) to another storage policy effected directory, it
will not copy inherited storage policy from source. So it will take effect from destination
file/dir parent storage policy. This rename operation is just a metadata change in Namenode.
The physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for admins from
distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the storage policy
satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as
movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message