hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3205) FSInputChecker and FSOutputSummer should allow better access to user buffer
Date Tue, 03 Nov 2009 21:03:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773191#action_12773191

Raghu Angadi commented on HADOOP-3205:

obviously, one buffer copy is not expected to consume more CPU than a CRC32 checksum (much
less for checksum of small chunks like 512). I roughly esitmated each buffer copy to take
around 1/3rd of what CRC32 takes (ratio might be larger with improved CRC32). Does it mean
it is not worth fixing second or third large CPU hogs on client? Of course when there is compression
and other higher processing is involved, even CRC32 wouldn't be the largest CPU hog.

We reduced CPU on DataNode while serving (HADOOP-2758, HADOOP-3164) mainly by avoiding buffer
copies (there is CRC involved). All the benchmarks there measure CPU consumed based on actual
CPU reported by the OS (not by a profiler).. it is also essentially a 'dfs -cat'.

In your tests is it reading a dfs file? I used 'dfs -cat' extensively in Datanode CPU benchmarks
reported in the above Jiras.

bq. This seems like a fair amount of confusing complexity due to the buffer management involved.

I am not so sure. But just not buffering at all might be good enough (the smallest size would
still be 512 bytes).

> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have access to full
user buffer. At any time DFS can access only up to 512 bytes even though user usually reads
with a much larger buffer (often controlled by io.file.buffer.size). This requires implementations
to double buffer data if an implementation wants to read or write larger chunks of data from
underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two separate jiras.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message