hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
Date Fri, 31 Oct 2014 22:57:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192684#comment-14192684
] 

Lars Hofhansl commented on HDFS-6698:
-------------------------------------

Pulling in selected changes from HDFS-6735 yields a HUGE speed improvement. A scan that took
16s to execute now finishes in 9s. (setup is such all data fits into the OS cache and the
HBase cache is disabled to isolate this code path)


> try to optimize DFSInputStream.getFileLength()
> ----------------------------------------------
>
>                 Key: HDFS-6698
>                 URL: https://issues.apache.org/jira/browse/HDFS-6698
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, HDFS-6698v2.txt,
HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() serving get reqeust.
Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() also could
not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
>     throws IOException {
>     // sanity checks
>     dfsClient.checkOpen();
>     if (closed) {
>       throw new IOException("Stream closed");
>     }
>     failures = 0;
>     long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl for getFileLength()
before HBase multi stream feature done. [~saint.ack@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message