hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-611) SequenceFile.Sorter should have a merge method that returns an iterator
Date Thu, 09 Nov 2006 08:11:39 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-611?page=comments#action_12448390 ] 
            
Devaraj Das commented on HADOOP-611:
------------------------------------

> I don't understand the ignoreSync, doSync code that you have in the the 
> SegmentDescriptor. You should never set the sync = null on a Reader. It is done 
> on merge outputs via writer.sync = null to keep the writer from putting in sync 
> blocks, which wastes space since the merge outputs won't be split as map 
> inputs.

This is done to make sure that we can handle inputs that come with a Sync. For example, if
you look at the code for readBlock in SequenceFile.java, there is a dependency on reader.sync
being null or not. The temp output does not have syncs and by default I don't expect syncs.
So for the sort output, we don't have syncs. But for the Public merge APIs that take external
pathnames as an argument, the assumption is that they are strictly conforming to the sequence
file format and hence require Sync checks.

Also, have a look at MergePass.run in SequenceFile.java (without this patch applied). There
is an explicit "reader.sync = null" done there.

> SequenceFile.Sorter should have a merge method that returns an iterator
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-611
>                 URL: http://issues.apache.org/jira/browse/HADOOP-611
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.9.0
>
>         Attachments: merge.patch, merge.patch, merge.patch, merge.patch
>
>
> SequenceFile.Sorter should get a new merge method that returns an iterator over the keys/values.
> The current merge method should become a simple method that gets the iterator and writes
the records out to a file.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message