hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
Date Mon, 28 Dec 2009 21:53:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794943#action_12794943
] 

Raghu Angadi commented on HDFS-755:
-----------------------------------

There is always a buffer of 512 bytes (checksum chunk size). So the worst case is 512 byte
reads. If 512 is not large enough, we can decide on some size like 4k. This way large readers
benefit from reduced copy and small readers pay a small penalty (1 syscall per 4k).

The misalignment can occur even after the first packet. Another option is to have two buffers
which which are read alternatively for crc and data (each time checking if other buffer has
available data).

>  So, I don't think we should do optimizatinos that would destroy performance of this
scenario.

true. at the same time this is an optimization jira.

I didn't get around to reproducing cpu improvement. I ran the commands you gave (in email).
will try again today.

I have already gave a +1 for the patch. We should just note that it needs more work to actually
make use of HADOOP-3205.

> Read multiple checksum chunks at once in DFSInputStream
> -------------------------------------------------------
>
>                 Key: HDFS-755
>                 URL: https://issues.apache.org/jira/browse/HDFS-755
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt,
hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum
chunks in a single call to readChunk. This is the HDFS-side use of that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message