hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10958) Add instrumentation hooks around Datanode disk IO
Date Tue, 13 Dec 2016 02:09:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Arpit Agarwal updated HDFS-10958:
    Attachment: HDFS-10958.05.patch

Thank you for the thorough review [~xyao]! I really appreciate it. The v05 patch addresses
most of your feedback (comments below). [This commit|https://github.com/arp7/hadoop/commit/1a5601460e830a161da1ad8ed586b3150c09971e#diff-5d7a0451c23486b45f5bda85b7022e5f]
shows the delta between the v04 and v05 patches.

Comments below:
# bq. NIT: Can you update the comment (line 745, line 747) to reflect the changes of the returned
type? "FileInputStream" -> "FileDescriptor"
#  bq. Line 149: BlockMetadataHeader#readHeader(File file) can be removed
# bq. NIT: Line 1033: BlockReceiver#adjustCrcFilePosition().  can we use streams.flushChecksumOut()
We need to call flush on the buffered output stream here. Calling streams.flushChecksumOut()
will not flush the buffered data to underlying FileOutputStream.
# bq. NIT: Line 59: Can we move DatanodeUtil#createFileWithExistsCheck to FileIoProvider like
we do for mkdirsWithExistsCheck/deleteWithExistsCheck?
This method was awkward to adapt to the call pattern in FileIoProvider. However I do pass
individual operations to the FileIoProvider so the exists/create calls will be instrumented.
Let me know if you feel strongly about it. :)
# bq. Line 1365: DataStorage#fullyDelete(). I'm OK with deprecate it.
Done. Removed the unused method.
# bq. NIT: Can you add a short description for the new key added or add cross reference to
the description in FileIoProvider class description.
I intentionally haven't documented this key as it's not targeted for end users. I have the
following text in the FileIoProvider javadoc. Let me know if this looks sufficient for now.
 * Behavior can be injected into these events by implementing
 * {@link FileIoEvents} and replacing the default implementation
# bq. NIT: these imports re-ordered with the imports below it
I don't see this issue in my diffs. Let me know if you still see it.
# bq. Line 1075: DatanodeUtil.dirNoFilesRecursive() can be wrapped into FileIoProvider.java
to get some aggregated metrics of dirNoFilesRecursive() in addition to FileIoProvider#listFiles().
I deferred doing since any disk slowness will show up in the fileIoProvider.listFiles call.
Can we re-evaluate instrumenting the recursive call in a follow up jira?
# bq.  Line: 202: this is a bug. We should delete the tmpFile instead of the file.
Good catch, fixed.
# bq. Line 322,323: Should we close crcOut like blockOut and metataRAF here? Can this be improved
with a try-with-resource to avoid leaking.
Good catch, fixed it. It looks like this is a pre-existing bug. We can't use try-with-resources
though as we only want to close the streams when there is an exception.
# bq. Line 89: FileIoEvents#onFailure() can we add a begin parameter for the failure code
path so that we can track the time spent on FileIo/Metadata before failure.
# bq. CountingFileIoEvents.java - Should we count the number of errors in onFailure()? 
# bq. FileIoProvider.java - NIT: some of the methods are missing Javadocs for the last few
added @param such as flush()/listDirectory()/linkCount()/mkdirs, etc.
# bq. Line 105: NIT: We can add a tag to the enum FileIoProvider#OPERATION to explicitly describe
the operation type FileIo/Metadata, which could simplify the FileIoEvents interface. I'm OK
with the current implementation, which is also good and easy to follow. 
Leaving it as it is for now to avoid complicating the patch further, but we can definitely
revise the interface as we work on implementations.
# bq. Line 155: I think we should put sync() under fileIo op instead of metadata op based
on we are passing true
# bq. Line 459: FileIoProvider#fullyDelete() should we declare exception just for fault injection
purpose? FileUtil.fullyDelete() itself does not throw. 
Good point. The only exception we could get in fullyDelete is a RuntimeException so there
is no change to the signature. I decided to pass all exceptions to the failure handler (except
errors) and let it decide which ones are interesting to it.
# bq. Line 575: NIT: File f -> File dir, Line 598: NIT: File f -> File dir
Fixed both.
# bq. Line 148: ReplicaOutputStreams#writeDataToDisk(), should we change the dataOut/checksumOut
to use the FileIoProvider#WrappedFileoutputStream to get the FileIo write counted properly?
These are already wrapped output streams. See LocalReplicaInPipeline.java:310.
# bq.  Line 83 readDataFully() should we change the dataIn/checksumIn  to use the FileIoProvider#WrappedFileInputStream
to get the FileIo read counted properly? 
These are also wrapped input streams. See LocalReplica#getDataInputStream where the streams
are allocated.

I also removed the gson dependency per offline feedback from [~anu].

> Add instrumentation hooks around Datanode disk IO
> -------------------------------------------------
>                 Key: HDFS-10958
>                 URL: https://issues.apache.org/jira/browse/HDFS-10958
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Xiaoyu Yao
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-10958.01.patch, HDFS-10958.02.patch, HDFS-10958.03.patch, HDFS-10958.04.patch,
> This ticket is opened to add instrumentation hooks around Datanode disk IO based on refactor
work from HDFS-10930.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message