Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 56972 invoked from network); 9 Nov 2006 08:12:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Nov 2006 08:12:08 -0000 Received: (qmail 85189 invoked by uid 500); 9 Nov 2006 08:12:14 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 85105 invoked by uid 500); 9 Nov 2006 08:12:14 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 84919 invoked by uid 99); 9 Nov 2006 08:12:13 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2006 00:12:13 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2006 00:11:59 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 86CA97142EC for ; Thu, 9 Nov 2006 00:11:39 -0800 (PST) Message-ID: <30433473.1163059899549.JavaMail.jira@brutus> Date: Thu, 9 Nov 2006 00:11:39 -0800 (PST) From: "Devaraj Das (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-611) SequenceFile.Sorter should have a merge method that returns an iterator In-Reply-To: <6429596.1161148355062.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/HADOOP-611?page=comments#action_12448390 ] Devaraj Das commented on HADOOP-611: ------------------------------------ > I don't understand the ignoreSync, doSync code that you have in the the > SegmentDescriptor. You should never set the sync = null on a Reader. It is done > on merge outputs via writer.sync = null to keep the writer from putting in sync > blocks, which wastes space since the merge outputs won't be split as map > inputs. This is done to make sure that we can handle inputs that come with a Sync. For example, if you look at the code for readBlock in SequenceFile.java, there is a dependency on reader.sync being null or not. The temp output does not have syncs and by default I don't expect syncs. So for the sort output, we don't have syncs. But for the Public merge APIs that take external pathnames as an argument, the assumption is that they are strictly conforming to the sequence file format and hence require Sync checks. Also, have a look at MergePass.run in SequenceFile.java (without this patch applied). There is an explicit "reader.sync = null" done there. > SequenceFile.Sorter should have a merge method that returns an iterator > ----------------------------------------------------------------------- > > Key: HADOOP-611 > URL: http://issues.apache.org/jira/browse/HADOOP-611 > Project: Hadoop > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assigned To: Devaraj Das > Fix For: 0.9.0 > > Attachments: merge.patch, merge.patch, merge.patch, merge.patch > > > SequenceFile.Sorter should get a new merge method that returns an iterator over the keys/values. > The current merge method should become a simple method that gets the iterator and writes the records out to a file. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira