Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <30433473.1163059899549.JavaMail.jira@brutus>
Date: Thu, 9 Nov 2006 00:11:39 -0800 (PST)
From: "Devaraj Das (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-611) SequenceFile.Sorter should have a
 merge method that returns an iterator
In-Reply-To: <6429596.1161148355062.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ http://issues.apache.org/jira/browse/HADOOP-611?page=comments#action_12448390 ] 
            
Devaraj Das commented on HADOOP-611:
------------------------------------

> I don't understand the ignoreSync, doSync code that you have in the the 
> SegmentDescriptor. You should never set the sync = null on a Reader. It is done 
> on merge outputs via writer.sync = null to keep the writer from putting in sync 
> blocks, which wastes space since the merge outputs won't be split as map 
> inputs.

This is done to make sure that we can handle inputs that come with a Sync. For example, if you look at the code for readBlock in SequenceFile.java, there is a dependency on reader.sync being null or not. The temp output does not have syncs and by default I don't expect syncs. So for the sort output, we don't have syncs. But for the Public merge APIs that take external pathnames as an argument, the assumption is that they are strictly conforming to the sequence file format and hence require Sync checks.

Also, have a look at MergePass.run in SequenceFile.java (without this patch applied). There is an explicit "reader.sync = null" done there.

> SequenceFile.Sorter should have a merge method that returns an iterator
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-611
>                 URL: http://issues.apache.org/jira/browse/HADOOP-611
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.9.0
>
>         Attachments: merge.patch, merge.patch, merge.patch, merge.patch
>
>
> SequenceFile.Sorter should get a new merge method that returns an iterator over the keys/values.
> The current merge method should become a simple method that gets the iterator and writes the records out to a file.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira