Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62A96569A for ; Thu, 12 May 2011 11:16:08 +0000 (UTC) Received: (qmail 48496 invoked by uid 500); 12 May 2011 11:16:07 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 48457 invoked by uid 500); 12 May 2011 11:16:07 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 48449 invoked by uid 99); 12 May 2011 11:16:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 May 2011 11:16:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of conglin02@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 May 2011 11:16:00 +0000 Received: by iym1 with SMTP id 1so1700341iym.35 for ; Thu, 12 May 2011 04:15:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=F/7M+ZZvVdKdD2h/sFZqzRnZ2I29XM3I0PGY9UqrD2E=; b=A+MFPMo2m9FYRcxItqC+j1ra6dtmjTZhqTsEew3yHrqr5nwOHWAekYejrRWy8byP8q jFRWEOda62JN5QItywZ6EDvpF2BdjQTSuf6tdomgLBoto9EERjNc7oebDjmwDKgl6bNV ejqFN0PRaF3UBQT4QER910kFiEe85gvZE9ecc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Hm1apmHJLZEpFTWOodOApg8c+VOpWBZNMHXcw01VKD9JtoY6e0RuSdAj5ysrhFnHn6 J9PfswUIY+8AGbPSmvRNQgwXQr/UgiUiNluHtueI4J+I7i0vRXqhaRA2pWuOcvSv1Tfu Ql0KQfGK54y3UCUGL1Nd/1bDW6+ORuZX5KYxc= MIME-Version: 1.0 Received: by 10.231.142.103 with SMTP id p39mr31450ibu.178.1305198939142; Thu, 12 May 2011 04:15:39 -0700 (PDT) Received: by 10.231.40.12 with HTTP; Thu, 12 May 2011 04:15:39 -0700 (PDT) In-Reply-To: References: Date: Thu, 12 May 2011 19:15:39 +0800 Message-ID: Subject: Re: How to merge several SequenceFile into one? From: =?GB2312?B?tNTB1g==?= To: mapreduce-user@hadoop.apache.org Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Dear Jason, If the order of the keys in sequence file is not important to me, in other words, the sort process is not necessary, how can I stop the distributed sort to save the consumption of resource? Thanks for your suggestion. Best Wishes, -Lin 2011/5/12 jason : > M/R job with a single reducer would do the job. This way you can > utilize distributed sort and merge/combine/dedupe key/values as you > wish. > > On 5/11/11, =B4=D4=C1=D6 wrote: >> Hi all, >> >> There is lots of SequenceFile in HDFS, how can I merge them into one >> SequenceFile? >> >> Thanks for you suggestion. >> >> -Lin >> >