hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines
Date Mon, 16 Oct 2017 11:21:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205748#comment-16205748
] 

Steve Loughran commented on HADOOP-14876:
-----------------------------------------

h4. Privacy scope

* add: sometimes things are marked as private when they end up being essential (example: UserGroupInformation).
In situations,  raise issues with the team to see if we can't add some form of @public tag
 \cite{HADOOP-10776}. 

Now, what about the fact that distributed shell example has (or at least did when I last looked)
use of private code?
e.g org.apache.hadoop.io.DataOutputBuffer, the timeline plugin, NMClientAsyncImpl, ...  You
can look at the imports and probably 20% of the class imports (not interfaces, yarn records)
are tagged as private/limited private. We are not in a position to tell people not to use
@Private, not given we consider doing so essential even for basic example yarn apps.

* What does it mean if something is tagged as LimitedPrivate for one app (esp HBase &
Hive, which aren't within our own codebase)? to me, that says "we know these things get used
downstream, or we've added them as a special secret back-door". But who gets to choose which
apps can actually use it? Limited private+outside our codebase == public, which is something
we should acknowledge when scoping things. And LimitedPrivate(Mapreduce) often means "every
YARN app needs these".

* What does it mean if a release removes/changes something you depended on which was tagged
private/limited private. Complain. It may get ignored, but it may have been done without awareness
of wide use.

h4. Semantics

I take this bit very seriously, having been deeply involved in the original paragraphs, and
an aficionado of all D.L. Parnas's writings on the notion of "interface".  As far as I'm concerned,
the defacto definitions of semantics are defined in our unit tests "what we expect" and in
those of widely used applications "what HBase and Hive expect". We know if we break the latter
then people complain, and, while we may do so, its not something want to. B

L113. yeah right. It's usually the first port of call, & if you think otherwise, you're
not writing enough downstream code. 

the original Compatibility.md calls out that some bits of the system have non-normative specifications;
eg fileystem. I would consider that significantly more normative than the javadocs, most of
which are vague aspirations of functionality. Usually the javadocs don't have any mention
of concurrency, which matters a lot; for that you do end up delving into the source and/or
using it in a way which appears to work (HDFS's use of input streams), when in fact they'r
just using accidental bits of the semantics which we are now expected to maintain.


+maybe mention StreamCapabilities.hasCapability as a way of determining if FS streams offer
a feature, say it's more to support variants in back ends rather than a way for us to remove
things. But do mention: good practice to check for new things rather than assume that if HDFS
implements it, it works everywhere.

L160" The audit log format may not change incompatibly between major releases." ?? "may change?"
or "must not"

L189. Need to explain how to differentiate log chaff from "real" output. Indeed, I'm curious
myself.


L208. We don't require log4j though; other back ends may be supportable.

L229. Nothing called "s3.*: no more

L298 "No new (exposed) dependency will be added to Hadoop between major releases."

Can't make that guarantee. Qualify "via the shaded clients"


Things that we've glossed over

* No statement on supported operating systems, filesystems, x86 parts IPv4 vs v6, If I code
for Windows, how long will hadoop-client work there? What if I target SPARC?

* Concurrency: say "we try not to make things worse"?; degradations are considered defects
except when its just some accidental side effect of excessive logging?


> Create downstream developer docs from the compatibility guidelines
> ------------------------------------------------------------------
>
>                 Key: HADOOP-14876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14876
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 3.0.0-beta1
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: HADOOP-14876.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message