hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl
Date Sun, 04 Sep 2016 23:56:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15463720#comment-15463720

Chris Douglas commented on HDFS-10742:

bq. Removed per-method measurement and aggregate stats to minimize overhead for cheap operations

Recording stack traces, etc. may be useful. I raised those overheads for two reasons. First,
that diagnostic data should only be built when necessary. Previous versions of the patch generated
diagnostic data that was ultimately discarded. Second, minutiae like "GC-friendliness", whether
a wrapper class on {{Lock}} adds dispatch overhead, class loading overhead, etc. when this
JIRA logs critical sections longer than *300ms* are irrelevant, literally by orders of magnitude.

I'll ask again after the motivation. Is the intent to identify subsystems that do too much
work (and/or I/O) in a critical section? To measure where the DN is spending cycles? In some
of these cases, stack traces could be very helpful. As long as this telemetry can be tuned
by configuration, it shouldn't create new problems for operators. To that point, instead of
reusing the {{FsDatasetImpl}} log, this may want to use its own (so disabling/redirecting
it doesn't affect other, critical information written to that log).

bq. Was your concern was with exposing AutoCloseableLock in hadoop utils or just an overall
objection to this idiom even if it's not exposed outside the DN.

I have two objections, one to the idiom and one to the design.

{{AutoCloseableLock}} is a new abstraction that doesn't elide any details of the thing it
obscures. Every path for {{Lock}} objects is possible, so it doesn't make it easier to reason
about lock correctness. It creates a special type for the _simplest_ possible case. This case
is so simple, we have automated tooling to detect it. Further, instead of every Java developer
looking at a {{Lock}} and scanning for well-worn expectations, we introduce an abstraction
that requires investigation. No advantage beyond developer convenience has been raised, and
it is a demonstrably _inconvenient_ abstraction.

If the interface were limited so it could _only_ be used in the resource idiom (as [proposed|https://s.apache.org/Eyaw]
earlier), then a developer could know that the {{Lock}} acquisition was limited to scope.
As semantic sugar it would still be pointless, because Java programmers also know how to use

This could be redeemed by changing the design. If the {{AutoCloseableLock}} constructor accepted
a {{Lock}} instance and cut its API to scope-limited changes, then telemetry like this could
be added as implementations of the {{Lock}} type with all the perceived advantages of ACL.
Devs can reason about the simpler API of ACL (scope-based acquisition), but still monitor
all acquisitions as required by this JIRA (impractical to implement using {{synchronized}}).

I remain skeptical of {{AutoCloseableLock}}, since one could just use the {{Lock}} in interface
and existing tooling to implement the same. But if its virtues are obvious to everyone but
me I won't stand in the way of including it.

> Measurement of lock held time in FsDatasetImpl
> ----------------------------------------------
>                 Key: HDFS-10742
>                 URL: https://issues.apache.org/jira/browse/HDFS-10742
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>         Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, HDFS-10742.003.patch,
HDFS-10742.004.patch, HDFS-10742.005.patch, HDFS-10742.006.patch, HDFS-10742.007.patch, HDFS-10742.008.patch,
HDFS-10742.009.patch, HDFS-10742.010.patch, HDFS-10742.011.patch
> This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is held by a
thread. Doing so will allow us to measure lock statistics.
> This can be done by extending the {{AutoCloseableLock}} lock object in {{FsDatasetImpl}}.
In the future we can also consider replacing the lock with a read-write lock.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message