hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3514) Reduce seeks during shuffle, by inline crcs
Date Tue, 01 Jul 2008 11:43:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609533#action_12609533

Jothi Padmanabhan commented on HADOOP-3514:

Here is one possible approach to solve this issue.

1. Create all the IFiles on a RawLocalFileSystem instead of the LocalFileSystem (LocalFileSystems
extend ChecksumFileSystem which we do not want here)
2. Modify all the writes to these files to go through an intermediate layer that calculates
and writes checksum for every 512 bytes of data.  On close of file, create and add a checksum
for the date from the previous checksum till end of the file.
3. Modify all the IFile reads to go through the intermediate layer as well which will do the
checksum verification transparent to the calling methods. 

Modifications will be done only for the files that are written to the disk, the InMemory buffer
reads/writes will not be affected.

This approach will have the same checksum overhead as the existing scheme, only that checksums
are stored inline in the same file as data.

An alternative approach could possibly be to have record level checksums (checksum for every
key/value pair). This approach could turn out to be costlier for smaller records where the
checksum could possibly become a sizeable overhead (compared to the record length).


> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>                 Key: HADOOP-3514
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3514
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
> The number of seeks can be reduced by half in the iFile if we move the crc into the iFile
rather than having a separate file.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message