Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09A67EDC8 for ; Fri, 15 Feb 2013 20:09:39 +0000 (UTC) Received: (qmail 19036 invoked by uid 500); 15 Feb 2013 20:09:34 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 18932 invoked by uid 500); 15 Feb 2013 20:09:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18924 invoked by uid 99); 15 Feb 2013 20:09:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Feb 2013 20:09:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.92 as permitted sender) Received: from [65.55.111.92] (HELO blu0-omc2-s17.blu0.hotmail.com) (65.55.111.92) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Feb 2013 20:09:23 +0000 Received: from BLU0-SMTP6 ([65.55.111.72]) by blu0-omc2-s17.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 15 Feb 2013 12:09:03 -0800 X-EIP: [yusBiX930itttRjIq7g4Zx7Ot15dPhgU] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [10.1.10.10] ([173.15.87.38]) by BLU0-SMTP6.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Fri, 15 Feb 2013 12:09:01 -0800 From: Michael Segel Content-Type: multipart/alternative; boundary="Apple-Mail=_4F3F09D6-FDE5-4786-9EB9-51F96360D9A3" MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Sorting huge text files in Hadoop Date: Fri, 15 Feb 2013 14:09:00 -0600 References: To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-OriginalArrivalTime: 15 Feb 2013 20:09:01.0479 (UTC) FILETIME=[4BDA0370:01CE0BB8] X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_4F3F09D6-FDE5-4786-9EB9-51F96360D9A3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Why do you need a 1TB block?=20 On Feb 15, 2013, at 1:29 PM, Jay Vyas wrote: > well.. ok... i guess you could have a 1TB block do an in place sort on = the file, write it to a tmp directory, and then spill the records in = order or something. at that point might as well not use hadoop. Michael Segel | (m) 312.755.9623 Segel and Associates --Apple-Mail=_4F3F09D6-FDE5-4786-9EB9-51F96360D9A3 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset="iso-8859-1" Why do you need a 1TB block? 

On Feb 15, 2013, at 1:29 PM, Jay Vyas <jayunit100@gmail.com> wrote:

well.. ok... i guess you could have a 1TB block do an in place sort on the file, write it to a tmp directory, and then spill the records in order or something.  at that point might as well not use hadoop.

Michael Segel  | (m) 312.755.9623

Segel and Associates


--Apple-Mail=_4F3F09D6-FDE5-4786-9EB9-51F96360D9A3--