hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5431) support cachepool-based quota management in path-based caching
Date Fri, 06 Dec 2013 19:59:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841628#comment-13841628

Colin Patrick McCabe commented on HDFS-5431:

    if (in.readBoolean()) {
    if (in.readBoolean())  {
    if (in.readBoolean()) {
    if (in.readBoolean()) {
    if (in.readBoolean()) {
    if (in.readBoolean()) {

I don't think the backwards-compatibility stuff here is really going to work.  The problem
is, if we add more booleans, the old code won't know they're there, and will ignore them.
 Then we will interpret those bytes as something else, which could cause some really bad results.

I think the best way to do this is to start with a 32-bit word, which we can treat as a bitfield.
 We can then load or not load field N according to whether bit N is set.  If there are bits
set that we don't know how to interpret, we can bail out with a nice error message rather
than trying to loading garbage and possibly corrupting the fsimage.  We probably should use
this approach for cache directives as well.

        int mode = Integer.parseInt(modeString, 8);
        info.setMode(new FsPermission((short)mode));
hey, there's a {{Short.parseShort}} too :)

About terminology: isn't "maximum" a better name for what we're implementing here than "quota"?
 If we implement something more sophisticated later, it could get confusing if we just use
the term "quota" here.  I also think we should rip out weight completely if we're not going
to support it any more.  I see a few places where "weight" is lingering now.  The feature
flag stuff should allow us to add it forwards-compatibly (although not backwards-compatibly)
in the future, if we want to.  I feel the same way about "reservation."

I'm not sure that we want a cache directive addition to fail when the maximum has been exceeded.
 The problem is, there isn't any good way to implement this kind of simple check for more
sophisticated quota methods like fair share or minimum share, etc.  Also, this is dependent
on things like what we think the sizes are of files and directories in the cluster, which
may change.  The result is very inconsistent behavior from the user's point of view.  For
example, maybe he can add cache directives if a datanode has not come up, but can't add them
once it comes up and we determine the full size of a certain file.  Or maybe he could add
them by manually editing the edit log, but not from the command-line.  It just feels inconsistent.
 I would rather we teach people to rely on looking at {{bytesNeeded}} versus {{bytesCached}}
to determine if they had enough space.

I wonder if we should add another metric that somehow allows users to disambiguate between
bytes not cached because of maximums / quotas / other "executive decision" and bytes not cached
because the DN had an issue.  Right now all the user can do is subtract bytesNeeded from bytesCached
and see that there is some gap, but he would have to check the logs to know why.

> support cachepool-based quota management in path-based caching
> --------------------------------------------------------------
>                 Key: HDFS-5431
>                 URL: https://issues.apache.org/jira/browse/HDFS-5431
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Andrew Wang
>         Attachments: hdfs-5431-1.patch
> We should support cachepool-based quota management in path-based caching.

This message was sent by Atlassian JIRA

View raw message