Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F4D710C8F for ; Mon, 19 Aug 2013 22:09:39 +0000 (UTC) Received: (qmail 50059 invoked by uid 500); 19 Aug 2013 22:09:34 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 49932 invoked by uid 500); 19 Aug 2013 22:09:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49903 invoked by uid 99); 19 Aug 2013 22:09:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Aug 2013 22:09:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chilinglam@gmail.com designates 209.85.220.178 as permitted sender) Received: from [209.85.220.178] (HELO mail-vc0-f178.google.com) (209.85.220.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Aug 2013 22:09:29 +0000 Received: by mail-vc0-f178.google.com with SMTP id ha12so3363610vcb.23 for ; Mon, 19 Aug 2013 15:09:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=0BZkmQZRNtzuDq18JU0USov6EA2cHQUJYtyFS79GNZ0=; b=mmEgZlAuPUwCrXbgj1apmve2pK21lRGbVjFNgV9esai6gEndB7rVQ926AfXjbxp1Ru 8kLU+BDD3SMcoVwrLjf/hw1jtnR9emkMMnLCwYOLKw0HB4E3vXVGTs6TWRXeduH+CXee 8LCsOrJGsSFxuMaUrFqkITf/ud8nJR1IcmVBDkK9B3CfHxGUKLQU+HqElIRGLUFxT9j6 ChxyFFhNukOoWBaLco4M5z6Tft5QtJ5CH9EhwJ4vZ64Y8EzeOHsdgB6hAXsG1AuPjCmm ppO10PorZKDSMR32z400q3/WK+DyM+AYFzInK23GEcVfBPizVaEYkOXPdmfDNMBgL/cT Ihyw== MIME-Version: 1.0 X-Received: by 10.52.92.73 with SMTP id ck9mr7778173vdb.2.1376950148595; Mon, 19 Aug 2013 15:09:08 -0700 (PDT) Received: by 10.221.2.138 with HTTP; Mon, 19 Aug 2013 15:09:08 -0700 (PDT) Date: Mon, 19 Aug 2013 18:09:08 -0400 Message-ID: Subject: produce a large sequencefile (1TB) From: Jerry Lam To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf3071c8867b396804e454303d X-Virus-Checked: Checked by ClamAV on apache.org --20cf3071c8867b396804e454303d Content-Type: text/plain; charset=ISO-8859-1 Hi Hadoop users and developers, I have a use case that I need produce a large sequence file of 1 TB in size when each datanode has 200GB of storage but I have 30 datanodes. The problem is that no single reducer can hold 1TB of data during the reduce phase to generate a single sequence file even I use aggressive compression. Any datanode will run out of space since this is a single reducer job. Any comment and help is appreciated. Jerry --20cf3071c8867b396804e454303d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Hi Hadoop users and developers,

I have = a use case that I need produce a large sequence file of 1 TB in size when e= ach datanode has =A0200GB of storage but I have 30 datanodes.=A0

The problem is that no single reducer can hold 1TB of data durin= g the reduce phase to generate a single sequence file even I use aggressive= compression. Any datanode will run out of space since this is a single red= ucer job.

Any comment and help is appreciated.

Jerry

--20cf3071c8867b396804e454303d--