hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9016) Display upgrade domain information in fsck
Date Fri, 22 Apr 2016 17:21:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254255#comment-15254255

Allen Wittenauer commented on HDFS-9016:

This is basically Hadoop's operability problems coming to the forefront:

* The compatibility guidelines don't offer any real out for CLI output that actually needs
to change based upon the implementation.  So no, technically, a special flag like '-replicadetails'
would not be magically immune.  Once the output is in a released version, it's fixed. If the
output changes based upon how the system is configured, there is no hints anywhere visible
that this is going to occur. The compatibility guidelines are the ONLY thread by which operation
teams are holding on and every time we ignore them, all hell breaks loose.  (Of course, a
lot of the people who work on the code don't realize this because they have no direct lines
of communication or really pay attention that much when an ops person does point out that
the world broke.  "Feature expediency" takes over for common sense just way too much.  HDFS
rolling upgrade is a great example--it actually caused data loss in certain instances because
someone thought it was a great idea to turn a heavily depended upon NN flag to be a no-op
with a success exit code.) 

* We don't build that many interfaces that can actually be used by the scripting languages
(perl, python, ruby, etc) leaving stdout as the only way the vast majority of ops people are
going to be able to process information.  While the JMX->REST hook was a great help, it's
read only and still doesn't expose vital information (fsck being the worst offender, because
frankly, it's doing way too much.  Why does it have to be literally the only source for block
level information?).  

To me, things like the storagepolicy code should have taken on the PMC and tried to revamp
the compatibility guidelines to specifically spell out that command line arguments that generate
output need to also specify stability in their accompanying documentation.  Buried in a javadoc
is useless.  Unless people are writing code, users don't see that information. See: metrics,
rack awareness, and a host of other bits that have had real documentation written over the
past 2 years. All of that information was previously done through word of mouth.

That said, I know what the outcome of this JIRA will be.  Another cranny where the rules don't
apply to come back and bite someone hard in the future.

> Display upgrade domain information in fsck
> ------------------------------------------
>                 Key: HDFS-9016
>                 URL: https://issues.apache.org/jira/browse/HDFS-9016
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-9016.patch
> This will make it easy for people to use fsck to check block placement when upgrade domain
is enabled.

This message was sent by Atlassian JIRA

View raw message