hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abraham Fine (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-15076) Enhance s3a troubleshooting docs, add perf section
Date Thu, 11 Jan 2018 23:50:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323239#comment-16323239
] 

Abraham Fine edited comment on HADOOP-15076 at 1/11/18 11:49 PM:
-----------------------------------------------------------------

I'm new to this codebase so I think I was able to point out a few parts of the documentation
that may be confusing to new users.

h3. performance.md
* Would it be possible to change the introduction setting from two sequential lists to a table?
That may make it easier to compare S3 and HDFS.
* {{list files a lot. This includes the setup of all queries agains data:}} typo in agains
* {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm not sure what
this sentence is trying to express
* {{Your problem may appear to be performance, but really it is that the commit protocol is
both slow and unreliable}} Isn't the commit protocol being slow part of "performance"? Can
this be rephrased? 
* {{This is leads to maximum read throughput}} "This will lead to..."?
* Perhaps describe the {{random}} policy before {{normal}} as one needs to understand {{random}}
before understanding {{normal}}.
* {{may consume large amounts of resources if each query is working with a different set of
s3 buckets}} Why wouldn't a large amount of resources be consumed if working with the same
set of s3 buckets?
* {{When uploading data, it is uploaded in blocks set by the option}} Consider changing to
"Data is uploaded in blocks set by the option..."
* Extra newline on 451

h3. troubleshooting_s3a.md
* {{Whatever problem you have, changing the AWS SDK version will not fix things, only change
the stack traces you see.}} Again, I'm new here so I'm not sure about the history of this
issue but this section seems a little heavy handed to me. Does amazon never release "bug fix"
versions of their client that are API compatible? How can we make this statement with such
certainty?


was (Author: abrahamfine):
I'm new to this codebase so I think I was able to point out a few parts of the documentation
that may be confusing to new users.

h3. performance.md
* Would it be possible to change the introduction setting from two sequential lists to a table?
That may make it easier to compare S3 and HDFS.
* {{list files a lot. This includes the setup of all queries agains data:}} typo in agains
* {{The MapReduce `FileOutputCommitter`. This also used by Apache Spark.}} I'm not sure what
this sentence is trying to express
* {{Your problem may appear to be performance, but really it is that the commit protocol is
both slow and unreliable}} Isn't the commit protocol being slow part of "performance"? Can
this be rephrased? 
* {{This is leads to maximum read throughput}} "This will lead to..."?
* Perhaps describe the {{random}} policy before {{normal}} as one needs to understand {{random}}
before understanding {{normal}}.
* {{may consume large amounts of resources if each query is working with a different set of
s3 buckets}} Why wouldn't a large amount of resources be consumed if working with the same
set of s3 buckets?
* {{When uploading data, it is uploaded in blocks set by the option}} Consider changing to
"Data is uploaded in blocks set by the option..."
* Extra newline on 451

h3. troubleshooting_s3a.md
* {{Whatever problem you have, changing the AWS SDK version will not fix things, only change
the stack traces you see.}} Again, I'm new here so I'm not sure about the history of this
issue but this section seems a little heavy handed to me. Does amazon never release "bug fix"
versions of their client that are API compatible? How can we make this statement with such
certainty?
* 

> Enhance s3a troubleshooting docs, add perf section
> --------------------------------------------------
>
>                 Key: HADOOP-15076
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15076
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: documentation, fs/s3
>    Affects Versions: 2.8.2
>            Reporter: Steve Loughran
>            Assignee: Abraham Fine
>         Attachments: HADOOP-15076-001.patch, HADOOP-15076-002.patch, HADOOP-15076-003.patch,
HADOOP-15076-004.patch
>
>
> A recurrent theme in s3a-related JIRAs, support calls etc is "tried upgrading the AWS
SDK JAR and then I got the error ...". We know here "don't do that", but its not something
immediately obvious to lots of downstream users who want to be able to drop in the new JAR
to fix things/add new features
> We need to spell this out quite clearlyi "you cannot safely expect to do this. If you
want to upgrade the SDK, you will need to rebuild the whole of hadoop-aws with the maven POM
updated to the latest version, ideally rerunning all the tests to make sure something hasn't
broken. 
> Maybe near the top of the index.md file, along with "never share your AWS credentials
with anyone"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message