hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3205) FSInputChecker and FSOutputSummer should allow better access to user buffer
Date Tue, 03 Nov 2009 19:21:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773132#action_12773132

Raghu Angadi commented on HADOOP-3205:

bq. This was originally rejected in HADOOP-6148 due to the complexity of maintaining two different

This jira is not about CRC32 cost, but I don't see why we can't use pure java CRC32 from HADOOP-6148.
It is already used in DataNode. CRC32 implementation is transparent to FSInputChecker. If
it is good for multiple other places in Hadoop, it is good for FileSystem as well.

bq. Are you suggesting here that we could do away with the internal buffer and assume that
users are always going to do large reads? Doesn't that violate the contract of fs.open taking
a buffer size?

essentially, yes. When the user gives large buffer, there is no need to copy to intermediate
buffer. We would not require or assume the user gives a large buffer but the common case is
that user does. DFSClient would read fixed length packet header from the underlying socket
and then read the data directly to user buffer if the size is comparable or larger than the
packet (64k).

I don't see how any this would violate the contract. fs.open buffer size is only a hint..
underlying FS should know what is more optimal.

> FSInputChecker and FSOutputSummer should allow better access to user buffer
> ---------------------------------------------------------------------------
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have access to full
user buffer. At any time DFS can access only up to 512 bytes even though user usually reads
with a much larger buffer (often controlled by io.file.buffer.size). This requires implementations
to double buffer data if an implementation wants to read or write larger chunks of data from
underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two separate jiras.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message