hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14973) Log StorageStatistics
Date Tue, 24 Oct 2017 10:02:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216638#comment-16216638
] 

Steve Loughran commented on HADOOP-14973:
-----------------------------------------

First,  sean, tag versions, give title a hint it's for S3, mark as improvement, move under
HADOOP-14831 so it can be tracked for Hadoop 1

Second, you haven't called FileSystem.toString() for a while have you? Or FSDataInputStream.toString()?
Because it prints all this stuff. How else do you think all the seek optimisation work was
debugged?
{code}
2017-10-10 16:23:47,050 [ScalaTest-main-running-S3ADataFrameSuite] INFO  s3.S3ADataFrameSuite
(Logging.scala:logInfo(54)) - Duration of scan result list = 2,118,450 nS
2017-10-10 16:23:47,050 [ScalaTest-main-running-S3ADataFrameSuite] INFO  s3.S3ADataFrameSuite
(Logging.scala:logInfo(54)) - FileSystem S3AFileSystem{uri=s3a://hwdev-steve-ireland-new,
workingDir=s3a://hwdev-steve-ireland-new/user/stevel, inputPolicy=random, partSize=8388608,
enableMultiObjectsDelete=true, maxKeys=5000, readAhead=262144, blockSize=1048576, multiPartThreshold=2147483647,
serverSideEncryptionAlgorithm='NONE', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@64f6964f,
metastore=NullMetadataStore, authoritative=false, useListV1=false, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=25,
available=25, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@60291e59[Running,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], statistics {182521443
bytes read, 39004 bytes written, 207 read ops, 0 large read ops, 76 write ops}, metrics {{Context=S3AFileSystem}
{FileSystemId=e62eeb1a-cced-473b-95f3-06c9910604ad-hwdev-steve-ireland-new} {fsURI=s3a://hwdev-steve-ireland-new}
{files_created=0} {files_copied=0} {files_copied_bytes=0} {files_deleted=0} {fake_directories_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} {op_copy_from_local_file=0}
{op_exists=0} {op_get_file_status=1} {op_glob_status=0} {op_is_directory=0} {op_is_file=0}
{op_list_files=1} {op_list_located_status=0} {op_list_status=0} {op_mkdirs=0} {op_rename=0}
{object_copy_requests=0} {object_delete_requests=0} {object_list_requests=2} {object_continue_list_requests=0}
{object_metadata_requests=2} {object_multipart_aborted=0} {object_put_bytes=0} {object_put_requests=0}
{object_put_requests_completed=0} {stream_write_failures=0} {stream_write_block_uploads=0}
{stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0}
{stream_write_total_data=0} {committer_commits_created=0} {committer_commits_completed=0}
{committer_jobs_completed=0} {committer_jobs_failed=0} {committer_tasks_completed=0} {committer_tasks_failed=0}
{committer_bytes_committed=0} {committer_bytes_uploaded=0} {committer_commits_failed=0} {committer_commits_aborted=0}
{committer_commits_reverted=0} {s3guard_metadatastore_put_path_request=1} {s3guard_metadatastore_initialization=0}
{s3guard_metadatastore_retry=0} {s3guard_metadatastore_throttled=0} {store_io_throttled=0}
{object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0}
{stream_write_block_uploads_pending=0} {stream_write_block_uploads_data_pending=0} {S3guard_metadatastore_put_path_latencyNumOps=0}
{S3guard_metadatastore_put_path_latency50thPercentileLatency=0} {S3guard_metadatastore_put_path_latency75thPercentileLatency=0}
{S3guard_metadatastore_put_path_latency90thPercentileLatency=0} {S3guard_metadatastore_put_path_latency95thPercentileLatency=0}
{S3guard_metadatastore_put_path_latency99thPercentileLatency=0} {S3guard_metadatastore_throttle_rateNumEvents=0}
{S3guard_metadatastore_throttle_rate50thPercentileFrequency (Hz)=0} {S3guard_metadatastore_throttle_rate75thPercentileFrequency
(Hz)=0} {S3guard_metadatastore_throttle_rate90thPercentileFrequency (Hz)=0} {S3guard_metadatastore_throttle_rate95thPercentileFrequency
(Hz)=0} {S3guard_metadatastore_throttle_rate99thPercentileFrequency (Hz)=0} {stream_read_fully_operations=0}
{stream_opened=0} {stream_bytes_skipped_on_seek=0} {stream_closed=0} {stream_bytes_backwards_on_seek=0}
{stream_bytes_read=0} {stream_read_operations_incomplete=0} {stream_bytes_discarded_in_abort=0}
{stream_close_operations=0} {stream_read_operations=0} {stream_aborted=0} {stream_forward_seek_operations=0}
{stream_backward_seek_operations=0} {stream_seek_operations=0} {stream_bytes_read_in_close=0}
{stream_read_exceptions=0} }}
- DataFrames
2017-10-10 16:23:47,051 [ScalaTest-main-running-S3ADataFrameSuite] INFO  s3.S3ADataFrameSuite
(Logging.scala:logInfo(54)) - Cleaning s3a://hwdev-steve-ireland-new/cloud-integration/DELAY_LISTING_ME/S3ADataFrameSuite
S3AOrcRelationSuite:
{code}

See? That's from a Spark {{logInfo(s"Stats $filesystem")}} instruction, with no changes make
to the spark codebase at all.

w.r.t broad stats there, what is needed is: aggregate collection of stats from executors.
where the work for a specific executor contains the stats for that task, rather than the statistics
summary for the entire life of the shared process. Same for Tez, I expect

* the _SUCCESS file in the HADOOP-13786 patch collects the VM stats and aggregates them; it
doesn't do what is needed, which is per-thread collection/diff.
* There's been discussion in Spark PRs about improving how executor stats are collected (currently
it just does a {{listFiles(task-output-dir, true).map(status => status.len).sum()}} . 
Tasks should be able to return a full map string-> long of that tasks' stats and aggregate
them.

This is broader than just s3; it needs to cover all stores, plus let committers & executors
add more data. 

[~liuml07] has done some of the initial work on chaining up StorageStats.

Anyway, if all you want is logging s3a stats, toString() does it, so I'd close it as a WORKSFORME.
However, we do need to glue together the entire storage stats mechanism, finishing off Mingliang's
work. Well volunteered!


> Log StorageStatistics
> ---------------------
>
>                 Key: HADOOP-14973
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14973
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>
> S3A is currently storing much more detailed metrics via StorageStatistics than are logged
in a MapReduce job. Eventually, it would be nice to get Spark, MapReduce and other workloads
to retrieve and store these metrics, but it may be some time before they all do that. I'd
like to consider having S3A publish the metrics itself in some form. This is tricky, as S3A
has no daemon but lives inside various other processes.
> Perhaps writing to a log file at some configurable interval and on close() would be the
best we could do. Other ideas would be welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message