hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3514) Reduce seeks during shuffle, by inline crcs
Date Sat, 02 Aug 2008 17:35:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619260#action_12619260
] 

Raghu Angadi commented on HADOOP-3514:
--------------------------------------

There are some more important differences from "traditional checksumming" even if it is used
for one checksum per file:

# User must start reading from the start of the file and read till the end.
# Any call to skip() in between will throw the checksum off.
# When read() returns data, it does not mean its checksum is correct. 
# While reading, user should know what the total file length is (mostly can't be used for
other streams).
# close() on input stream closes the underlying stream but close on output stream does not.

These are based on my brief look at ChecksumInputStream. It is still possible to use these
streams in another place.. I doubt it will in the context of typical checksum stream. I would
still suggest moving these next to IFile unless there another existing context where these
could be used.

> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
>                 Key: HADOOP-3514
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3514
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
>
>         Attachments: hadoop-3514-v1.patch, hadoop-3514-v2.patch, hadoop-3514.patch
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc into the iFile
rather than having a separate file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message