hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests
Date Fri, 06 May 2016 19:53:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274641#comment-15274641
] 

Colin Patrick McCabe commented on HADOOP-13028:
-----------------------------------------------

{code}
926 <property>
927	  <name>fs.s3a.readahead.range</name>
928	  <value>65536</value>
929	  <description>Bytes to read ahead during a seek() before closing and
930	  re-opening the S3 HTTP connection.</description>
931	</property>
{code}
Hmm, should this be {{fs.s3a.readahead.default}}?  It seems like this is the default if the
user doesn't call {{FSDataInputStream#setReadahead}},

{{S3AInputStream#closed}}: it seems like this should be an {{AtomicBoolean}}.  Otherwise two
threads could both enter this code block, right?
{code}
362	    if (!closed) {
363	      closed = true;
364	      super.close();
365	      closeStream("close() operation", this.contentLength);
366	      streamStatistics.close();
367	    }
{code}

{code}
  public S3AInstrumentation.InputStreamStatistics getStreamStatistics() {
{code}
Maybe should be called {{getS3StreamStatistics}}, reflecting the fact that this API is s3-specific?

Is it really necessary to put statistics information into the {{toString}} methods of the
streams?  It seems like this could lead to compatibility woes, and we have the API described
above to provide this information anyway.

> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, HADOOP-13028-004.patch,
HADOOP-13028-005.patch, HADOOP-13028-006.patch, HADOOP-13028-007.patch, HADOOP-13028-008.patch,
HADOOP-13028-009.patch, HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch,
HADOOP-13028-branch-2-010.patch, org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, closing connections
may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of open/close/failure+reconnect
operations, timers of how long things take. This can be used downstream to measure efficiency
of the code (how often connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message