hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
Date Wed, 11 Nov 2009 23:00:41 GMT

     [ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-755:

    Attachment: hdfs-755.txt

Here's an updated patch which fixes some behavior when running against an unpatched Common.
If Common includes HADOOP-3205, it will be faster, and if it doesn't include HADOOP-3205,
it should still work at the old speed.

I also ran some more benchmarks over lunch, running "fs -cat bigfile bigfile bigfile ...20
times..." repeatedly with and without the patch. This differs from my previous benchmark in
that each JVM runs for a good 40-50 seconds - enough time to fully JIT the code, etc. The
patch is about a 3.4% speedup compared to trunk for these long reads as well (at 95% significance

> Read multiple checksum chunks at once in DFSInputStream
> -------------------------------------------------------
>                 Key: HDFS-755
>                 URL: https://issues.apache.org/jira/browse/HDFS-755
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-755.txt, hdfs-755.txt
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum
chunks in a single call to readChunk. This is the HDFS-side use of that new feature.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message