hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3205) FSInputChecker and FSOutputSummer should allow better access to user buffer
Date Tue, 03 Nov 2009 20:21:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773168#action_12773168

Todd Lipcon commented on HADOOP-3205:

bq. I don't see why we can't use pure java CRC32 from HADOOP-6148

We already do use this - the microbenchmark above (reading checksummed files from /dev/shm)
shows that CRC is the majority of the CPU overhead in FSInputChecker and that array copying
makes up very little of the time.

bq. When the user gives large buffer, there is no need to copy to intermediate buffer

I see... so I guess what you're saying is that we should do away with the internal BufferedInputStream
in DFSClient.BlockReader, and then occasionally insert a buffer only in the case when the
user-provided buffer is small? This seems like a fair amount of confusing complexity due to
the buffer management involved.

Do we have some kind of benchmark that indicates that these copies make up any appreciable
overhead compared to the fairly slow checksumming?

> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have access to full
user buffer. At any time DFS can access only up to 512 bytes even though user usually reads
with a much larger buffer (often controlled by io.file.buffer.size). This requires implementations
to double buffer data if an implementation wants to read or write larger chunks of data from
underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two separate jiras.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message