hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
Date Fri, 08 Dec 2017 09:38:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283263#comment-16283263
] 

Chris Douglas commented on HDFS-10285:
--------------------------------------

As [~umamaheswararao] mentioned earlier, HDFS-12090 proposes to build on the SPS. [~virajith]
wrote a [prototype|https://github.com/Microsoft-CISL/hadoop-prototype/tree/SPS-9806] demonstrating
the core of the design. The Mover is not sufficient; we're banking on a more robust solution
for HSM, and wherever it lives, we need something like the SPS.

NameNode load is important, but the decision to implement the balancer as an external process
[predates|https://issues.apache.org/jira/browse/HADOOP-1652] many scalability and performance
improvements. To pick one salient example, it precedes running with a [read-write lock|https://issues.apache.org/jira/browse/HDFS-1093]
by almost three years. The Mover (IIRC) started with the balancer code. Scans are outside
of the NameNode today due to a decade-old analysis, and because to move scans into the NameNode,
features added subsequently would need to be reexamined and possibly redesigned. Also, subsequent
extensions to- and comfort with- the balancer make replacing it unessential. This particular
precedent for scans is not a reliable guide, on its own. We can be confident that adding load
to the NN will drop throughput in some cases, but without benchmarks we don't know whether
those cases are blockers. Have any benchmarks been run, particularly with the SPS disabled?

Also, the state-of-the-art for HSM supports neither sophisticated deployment nor failover.
Many new services and features in YARN are available in preview before they even support secure
deployments. The NameNode acquired these features over years; insisting that services implement
that full complement of capabilities before anyone can be certain the service is _useful_
is not workable, particularly in an open-source project. On that subject, if this approach
doesn't work out, deleting a separate server is much easier than extracting a feature from
the NN. Offhand, I can't think of a single example of the latter. The aspect-oriented fault
injection maybe, but that was both outside the NN and only for testing.

[~rakeshr] started to quantify the impact, which will help to either tranquilize anxiety about
this feature or define thresholds for accepting it. Skimming the implementation, some of this
could be extracted into an external service, but it would not be straightforward. Specifically,
the SPS keeps references to the namesystem and block manager. To [~anu]'s earlier point, "smart"
policies using internal data will be very difficult to extract into a separate service later,
should that become necessary or desirable.

Would it be possible to extract an API for the SPS to other NN components (particularly the
namesystem, block manager, and datanode manager)? That might make the couplings more explicit,
ideally so the interface would be sufficient as an RPC protocol, if the SPS were moved outside
the NN.

bq. I’m curious why it isn’t just part of the standard replication monitoring. If the
DN is told to replicate to itself, it just does the storage movement.
In addition to the points that Uma raised, for in-memory and provided replicas, we'd like
to support more than one replica per DN (HDFS-9810). Intra-DN rebalancing also may not benefit
from deleting replicas until the volume is short on space. Copying a temporarily hot replica
to SSD, then back to HDD when it's cold again is also avoidable overhead, if the SSD replica
just be deleted. Agreed, it does seem like this is one operation with parameters, not separate
mechanisms.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, HDFS-10285-consolidated-merge-patch-01.patch,
HDFS-10285-consolidated-merge-patch-02.patch, HDFS-10285-consolidated-merge-patch-03.patch,
HDFS-SPS-TestReport-20170708.pdf, Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, Storage-Policy-Satisfier-in-HDFS-May10.pdf,
Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These policies
can be set on directory/file to specify the user preference, where to store the physical block.
When user set the storage policy before writing data, then the blocks could take advantage
of storage policy preferences and stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then the blocks
would have been written with default storage policy (nothing but DISK). User has to run the
‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed
system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool
as different nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage policy file
(inherited policy from parent directory) to another storage policy effected directory, it
will not copy inherited storage policy from source. So it will take effect from destination
file/dir parent storage policy. This rename operation is just a metadata change in Namenode.
The physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for admins from
distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the storage policy
satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as
movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message