hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5664) try to relieve the BlockReaderLocal read() synchronized hotspot
Date Mon, 16 Dec 2013 02:01:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848753#comment-13848753

Colin Patrick McCabe commented on HDFS-5664:

bq. Thanks Colin. Then we should remove all synchronization if single threaded only?

That's a reasonable reply.  But I suspect that in practice, removing "synchronized" from all
those methods would break a lot of code that currently works.  The overhead of locking also
tends to be low on modern CPUs when the lock is not contended, so I don't think that we'd
save that much.  It would be interesting to benchmark, though.

I kind of wish that we were able to do multiple preads in parallel, but I suspect that the
amount of refactoring you would need to get to that state would be massive... right now there
is an assumption that everything in the stream is done under a big lock.

bq. Could we save on NN trips if we had added a 'clone' of DFSIS where'd create a new one
passing in an existing one; the new DFSIS would use the block info the original had already
obtained which would be enough to get the new DFSIS off the ground w/o a trip to the NN?

I haven't thought about it too much, but that seems like an interesting idea.  Probably a
good direction to go in.  We could definitely copy the block location information from one
stream into another new stream.  You would not be able to reuse the TCP socket, though, if
it were a remote read.  But that would still save you a trip to the NameNode.

> try to relieve the BlockReaderLocal read() synchronized hotspot
> ---------------------------------------------------------------
>                 Key: HDFS-5664
>                 URL: https://issues.apache.org/jira/browse/HDFS-5664
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
> Current the BlockReaderLocal's read has a synchronized modifier:
> {code}
> public synchronized int read(byte[] buf, int off, int len) throws IOException {
> {code}
> In a HBase physical read heavy cluster, we observed some hotspots from dfsclient path,
the detail strace trace could be found from: https://issues.apache.org/jira/browse/HDFS-1605?focusedCommentId=13843241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13843241
> I haven't looked into the detail yet, put some raw ideas here firstly:
> 1) replace synchronized with try lock with timeout pattern, so could fail-fast,  2) fallback
to non-ssr mode if get a local reader lock failed.
> There're two suitable scenario at least to remove this hotspot:
> 1) Local physical read heavy, e.g. HBase block cache miss ratio is high
> 2) slow/bad disk.
> It would be helpful to achive a lower 99th percentile HBase read latency somehow.

This message was sent by Atlassian JIRA

View raw message