hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rich Haase <rha...@pandora.com>
Subject Re: fsck output compatibility question with regard to HDFS-7281
Date Tue, 28 Apr 2015 18:13:35 GMT
I'm late to the discussion so I apologize if this has already been suggested.   Can't we just
add new options flags to include new cli output?  Seems like that would work regardless of
the cli being changed.   If compatibility is broken it could be done as part of a major release.
 

Eg.

Add a -x option to hdfs dfs -ls in 2.7.  If the x option is popular enough change cli output
when 3.0 is released.

Just a thought.

Sent from my iPhone

> On Apr 28, 2015, at 12:00 PM, Andrew Wang <andrew.wang@cloudera.com> wrote:
> 
> I'm surprised by this compatibility requirement. It's quite onerous, since
> it means we can't evolve the output at all. There's no standardized way to
> parse CLI output, so who knows what might break user scripts. e.g. if we
> wanted to display a "+" for ACLs in ls output, that'd be incompatible. Same
> deal for an xattr or encryption bit in ls output. Adding new cluster/node
> state to dfsadmin -report. We have some left and right justified columns in
> cacheadmin output, and changing a column header might add an extra space
> and again break a script. Our CLI output is just not intended to be a
> stable API.
> 
> This is also not something typically upheld by unix-y commands. BSD vs. GNU
> already leads to incompatible flags and output. Most of these commands
> haven't been changed in 20 years, but that doesn't constitute a compat
> guarantee.
> 
> One example I like is git. It splits its commands into "porcelain" and
> "plumbing", where plumbing is meant for script use. An excerpt from the man
> page:
> 
>       The interface (input, output, set of options and the semantics) to
> these low-level commands are meant to be a lot more stable than Porcelain
> level commands, because these commands are primarily for scripted use. The
> interface to Porcelain commands on the other hand are subject to change in
> order to improve the end user experience.
> 
> This is something I'd like to follow for our own commands. We provide
> different APIs for machine consumption vs. human consumption, and make this
> clear in the compat guide. Of course, we should still be judicious when
> changing the human output, but I just don't see a good way forward without
> relaxing our current compat guidelines.
> 
> The other thing to consider is providing supported Java APIs for the
> commonly-parsed shell commands. This is something we have much more
> experience with.
> 
> Best,
> Andrew
> 
>> On Fri, Apr 24, 2015 at 1:17 PM, Yongjun Zhang <yzhang@cloudera.com> wrote:
>> 
>> Thanks Chris, good clarification!
>> 
>> --Yongjun
>> 
>> On Fri, Apr 24, 2015 at 12:36 PM, Chris Nauroth <cnauroth@hortonworks.com>
>> wrote:
>> 
>>> Metrics/JMX is covered by our compatibility guidelines:
>> http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/Comp
>>> atibility.html#MetricsJMX
>>> 
>>> 
>>> Metrics/JMX is similar to our usage of Protocol Buffers/JSON that I
>>> mentioned.  It supports backwards-compatible evolution if the change is
>>> done correctly.  Adding new fields/beans is compatible.  Changing the
>>> names or data types of existing fields/beans is incompatible.  Deleting
>>> existing fields/beans is incompatible.
>>> 
>>> --Chris Nauroth
>>> 
>>> 
>>> 
>>> 
>>>> On 4/24/15, 11:19 AM, "Yongjun Zhang" <yzhang@cloudera.com> wrote:
>>>> 
>>>> Thanks Allen and Chris!
>>>> 
>>>> What about adding new entries to jmx report? Somehow I had the
>> impression
>>>> that if we add new entries to it, it's not considered incompatible.
>> Often
>>>> within the same minor release, we want to add new info to jmx report
>>>> instead of waiting for a major release.
>>>> 
>>>> For CLI like fsck, maybe we can add a new command line option to enable
>>>> the
>>>> change, and if the command line option is not enabled, don't change the
>>>> output, so we can still commit the change within the same release line?
>>>> 
>>>> Thanks.
>>>> 
>>>> --Yongjun
>>>> 
>>>> 
>>>> On Fri, Apr 24, 2015 at 11:05 AM, Chris Nauroth <
>> cnauroth@hortonworks.com
>>>> 
>>>> wrote:
>>>> 
>>>>> Allen, thank you for calling this out.  I was not aware of this part
>> of
>>>>> the compatibility guidelines.  I committed one of those fsck changes
>> in
>>>>> HDFS-7933.  I see you flagged the issue as incompatible, which agrees
>>>>> with
>>>>> the compatibility guidelines.
>>>>> 
>>>>> "Changing the path of a command, removing or renaming command line
>>>>> options, the order of arguments, or the command return code and output
>>>>> break compatibility and may adversely affect users."
>>>>> 
>>>>> Most of this intuitively makes sense.  Even ignorant of the
>>>>> compatibility
>>>>> guidelines, I would have known to push back on patches that change the
>>>>> path, remove or rename existing options, or change the order of
>>>>> arguments.
>>>>> 
>>>>> HDFS-7933 was an example of an output change, and I find this part of
>>>>> the
>>>>> compatibility guidelines much more challenging.  We need to be able to
>>>>> evolve CLI output within a release line.  On the protocol side, our
>> use
>>>>> of
>>>>> Protocol Buffers and JSON supports evolution if we use it correctly.
>>>>> How
>>>>> can we achieve the equivalent for the CLI?  For example, can we turn
>>>>> HDFS-7933 into a backwards-compatible change if it preserves the old
>>>>> output, and only adds the new information if the user passes a new
>>>>> argument, such as -count-decom?
>>>>> 
>>>>> Are there other specific issues that you have in mind for CLI
>>>>> incompatibility problems?  Let's see if we can find a way to amend
>> them
>>>>> to
>>>>> satisfy the compatibility guidelines.
>>>>> 
>>>>> --Chris Nauroth
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 4/24/15, 1:02 AM, "Allen Wittenauer" <aw@altiscale.com>
wrote:
>>>>>> 
>>>>>> 
>>>>>> On Apr 24, 2015, at 5:53 AM, Yongjun Zhang <yzhang@cloudera.com>
>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Basically we are adding two additional lines to the report (as
>>>>>>> highlighted
>>>>>>> above).
>>>>>>> 
>>>>>>> Theoretically if a tool parses existing fsck report and expects
the
>>>>>>> 'Corrupt blocks" entry to be right after the "Average block
>>>>> replication"
>>>>>>> entry, then the change would fail the tool. But is this really
a
>>>>>>> concern?
>>>>>>> 
>>>>>>> I guess this is not really a concern, so I don't think this change
>> is
>>>>>>> incompatible. but would anyone please comment?
>>>>>> 
>>>>>>      If it changes the output of a CLI command, it's an
>> incompatible
>>>>> change:
>> http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/Co
>>>>>> mpatibility.html#Command_Line_Interface_CLI
>>>>>> 
>>>>>> 
>>>>>>      Other changes to fsck have been punted to 3.x for the *exact
>>>>> same
>>>>>> reason*. In other cases, committers have violated these rules in
>>>>> branch-2
>>>>>> (not just to fsck, but to all sorts of other command line bits, even
>>>>>> removing command options!) to the point that our compatibility
>>>>> guarantees
>>>>>> are pretty much useless.  It's open season on nuking the ecosystem.
>> :(
>>>>>> 
>>>>>>      People not following the compat rules is one of the reasons
I
>>>>> started
>>>>>> building my own changes and release notes, because we have too many
>>>>>> committers either accidentally committing incompatible changes or
>> just
>>>>>> outright lying about them.  (Š and, as much as I hate to say it,
the
>>>>> HDFS
>>>>>> project is easily the biggest offender.)
>> 

Mime
View raw message