hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3205) Read multiple chunks directly from FSInputChecker subclass into user buffers
Date Sat, 14 Nov 2009 02:15:39 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777805#action_12777805
] 

Eli Collins commented on HADOOP-3205:
-------------------------------------

Hey Todd,

Nice change! 

Would be good to add a comment where the checksum array gets created indicating that the size
of the  array determines the size of the underlying IO. 

In readChunk an invalid size checksum file currently results in a ChecksumException, is now
an EOFException, would be good to revert back to ChecksumException and add a unit test for
this.

For sanity, do the existing tests cover both small, odd-size single block files?

In readChecksum I'd make expected and calculated ints (know they were longs before your change)
since the code deals w 32-bit sums and just truncate sum.getValue() rather than cast and truncate
the checksumInts.get() return value. Could assert sum.getValue() == 0xffffffffL & sum.getValue()
if we're feeling paranoid.

It would be good to test the LocalFs version (appears LocalFs is untested) but since the LocalFs
and LocalFileSystem diffs are the same let's leave that to a separate jira.

Nits:
- Idealy the new illegal argument exception in FSInputChecker.set would be an assert since
the given checkSumSize is not configurable or passed in at runtime, however since we don't
enable asserts yet by default perhaps ok to leave as is.

- The new code in readChecksum would be more readable if it were left pulled out into a separate
function (how verifySum was).

- Was this way before your change but the back-to-back if statements on line 252-253 could
be combined triviallly.

Thanks,
Eli

> Read multiple chunks directly from FSInputChecker subclass into user buffers
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3205
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Raghu Angadi
>            Assignee: Todd Lipcon
>         Attachments: hadoop-3205.txt, hadoop-3205.txt, hadoop-3205.txt, hadoop-3205.txt
>
>
> Implementations of FSInputChecker and FSOutputSummer like DFS do not have access to full
user buffer. At any time DFS can access only up to 512 bytes even though user usually reads
with a much larger buffer (often controlled by io.file.buffer.size). This requires implementations
to double buffer data if an implementation wants to read or write larger chunks of data from
underlying storage.
> We could separate changes for FSInputChecker and FSOutputSummer into two separate jiras.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message