hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
Date Thu, 20 Nov 2014 06:58:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219072#comment-14219072

Lars Hofhansl commented on HDFS-6735:

Thanks [~cmccabe]. "infoLock" is better. I'll fix the indentation later. Let me have a look
at tryReadZeroCopy again. I had mapped out all members and which methods use what, and concluded
the synchronized wasn't needed, quite possible I made a mistake.

Another locking option is not to synchronize on <this> at all, but to have two locks
("streamLock" and "pLock", or whatever are good names). That way the intend might be more
Yet another option would be to disentangle to two apis by subclassing or delegation (since
the issue really is that we have state for two different modes of operation in the same class),
that'd be a bigger change though.

Meanwhile in HBase land:
Tested this with HBase and observed with a sampler that all delays internal to DFSInputStream
are gone, which is nice.

I committed a change to HBase to allow us to (1) have compaction use their own input streams
so they do not interfere with user scans along the same files and (2) optionally force p-reads
for all user scans. See HBASE-12411.

Especially with #2 I see nice speedups for many concurrent scanners essentially to what my
disks can sustain, but a 50% slow downs for a single scanner per file only - which is obvious
as we're not benefiting from prefetching now.

> A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
> -----------------------------------------------------------------------------------------
>                 Key: HDFS-6735
>                 URL: https://issues.apache.org/jira/browse/HDFS-6735
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
> In current DFSInputStream impl, there're a couple of coarser-grained locks in read/pread
path, and it has became a HBase read latency pain point so far. In HDFS-6698, i made a minor
patch against the first encourtered lock, around getFileLength, in deed, after reading code
and testing, it shows still other locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case to show the
issue and the improved result.
> This is important for HBase application, since in current HFile read path, we issue all
read()/pread() requests in the same DFSInputStream for one HFile. (Multi streams solution
is another story i had a plan to do, but probably will take more time than i expected)

This message was sent by Atlassian JIRA

View raw message