hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
Date Thu, 28 May 2015 23:34:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563895#comment-14563895
] 

Andrew Wang commented on HDFS-7923:
-----------------------------------

Neato patch Colin, this is a high-ish level review, I probably need to do another pass.

Small stuff:
* Missing config key documentation in hdfs-defaults.xml
* requestBlockReportLeaseId: empty catch for unregistered node, we could add some more informative
logging rather than relying on the warn below

BlockReportLeaseManager
* I discussed the NodeData structure with Colin offline, wondering why we didn't use a standard
Collection. Colin brought up the reason of reducing garbage, which seems valid. I think we
should consider implementing IntrusiveCollection though rather than writing another.
* I also asked about putting NodeData into DatanodeDescriptor. Not sure what the conclusion
was on this, it might reduce garbage since we don't need a separate NodeData object.
* I prefer Precondition checks for invalid configuration values at startup, so there aren't
any surprises for the user. Not everyone reads the messages on startup.
* requestLease has a check for isTraceEnabled, then logs at debug level

BPServiceActor:
* In offerService, we ignore the new leaseID if we already have one. On the NN though, a new
request wipes out the old leaseID, and processReport checks based on leaseID rather than node.
This kind of bug makes me wonder why we really need the leaseID at all, why not just attach
a boolean to the node? Or if it's in the deferred vs. pending list?
* Can we fix the javadoc for scheduleBlockReport to mention randomness, and not "send...at
the next heartbeat?" Incorrect right now.
* Have you thought about moving the BR scheduler to the NN side? We still rely on the DNs
to jitter themselves and do the initial delay, but we could have the NN handle all this. This
would also let the NN trigger FBRs whenever it wants. We could also do better than random
scheduling, i.e. stride it rather than jitter. Incompatible, so we probably won't, but fun
to think about :)
* scheduleBlockReport(long) do we want to add a checkArgument that delayMs is geq 0? You nixed
the else case.

DatanodeManager:
* Could we do the BRLManager register/unregister in addDatanode and removeDatanode? I think
this is safe, since on a DN restart it'll provide a lease ID of 0 and FBR, even without a
reg/unreg.

> The DataNodes should rate-limit their full block reports by asking the NN on heartbeat
messages
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7923
>                 URL: https://issues.apache.org/jira/browse/HDFS-7923
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch
>
>
> The DataNodes should rate-limit their full block reports.  They can do this by first
sending a heartbeat message to the NN with an optional boolean set which requests permission
to send a full block report.  If the NN responds with another optional boolean set, the DN
will send an FBR... if not, it will wait until later.  This can be done compatibly with optional
fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message