hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3245) Add metrics and web UI for cluster version summary
Date Thu, 22 Aug 2013 17:13:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747668#comment-13747668

Kihwal Lee commented on HDFS-3245:

If a nodeReg has null version, countSoftwareVersions() will also count them. The version check
in NameNodeRpcServer will blow up first, so I don't think it will happen, but if used in a
different context, it might not work as intended.

It will be better if the map is updated in place instead of being recounted and reconstructed
on every removal or registration of DN. Whenever the version field is changed (including the
cases where it's reset to null), the map could simply be updated. A complete recount may be
done on refresh only.

Concurrency control will become a bit different in this approach because the hash map is updated
in place. Since the hash map must be small and getDatanodesSoftwareVersions() isn't called
too frequently, a copy could be created under a proper lock and returned. I expect up to two
different versions in the map, but up to several versions may be present in certain cases.

Since resizing will be rare and not going to be expensive due to the small size, we could
specifically initialize the the hash map to have capacity of, say, 4 and load factor of 0.75.
Then it will resize to 8 if there are more than 3 different versions, which should be rare.
It won't probably grow beyond 8.

The UI and jmx seem fine.
> Add metrics and web UI for cluster version summary
> --------------------------------------------------
>                 Key: HDFS-3245
>                 URL: https://issues.apache.org/jira/browse/HDFS-3245
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Ravi Prakash
>         Attachments: HDFS-3245.branch-2.patch, HDFS-3245.branch-2.patch, HDFS-3245.patch
> With the introduction of protocol compatibility, once HDFS-2983 is committed, we have
the possibility that different nodes in a cluster are running different software versions.
To aid operators, we should add the ability to summarize the status of versions in the cluster,
so they can easily determine whether a rolling upgrade is in progress or if some nodes "missed"
an upgrade (eg maybe they were out of service when the software was updated)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message