hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishna Ramachandran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2020) Use new FileContext APIs for all mapreduce components
Date Thu, 02 Sep 2010 19:27:53 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905649#action_12905649

Krishna Ramachandran commented on MAPREDUCE-2020:

Comments from Chris D

> Quick update:
> Am working on Map/ReduceTask migration. There are couple potential
> blockers for a full migration
>       • MapTask uses RawFileSystem in some places on local files which
> likely saves checksum overheads. But there is no convenient way of
> getting a handle if we use LocalFileContext - talking to hdfs folks
> on this.
>               • Near term we may have to move away from RawFileSystem use
> LocalFileSystem

Absolutely not. The spill files require chunking of the checksum,
which (AFAIK) is not available from LocalFileSystem APIs. We don't
require redundant checksumming and- given that this is in an inner
loop- ought not to tolerate it.

>               • may not be that bad as crc opertations are native and should be
> fast (even with Raw MapTask enables some approximate checksum using
> ChecksumFileSystem)

The native code is an overhead, not an advantage. Nicholas rewrote the
checksum code in HADOOP-6166 to speed this up in HDFS and MR. AFAIK,
MapTask does not use ChecksumFileSystem anywhere in the data path...
it might use it for ancillary files somewhere, but the spill, merge,
and output serving should use the raw FS exclusively. -C

>       • sequenceFile.createWriter is hadoop utility and takes FileSystem
> as a parameter.
> I have a cleaner solution for exists() - no longer nneds a workaround.

> Use new FileContext APIs for all mapreduce components 
> ------------------------------------------------------
>                 Key: MAPREDUCE-2020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2020
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Krishna Ramachandran
>            Assignee: Krishna Ramachandran
>         Attachments: mapred-2020-1.patch, mapred-2020.patch
> Migrate mapreduce components to using improved FileContext APIs implemented in
> HADOOP-4952 and 
> HADOOP-6223

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message