Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 11701 invoked from network); 12 Jun 2010 06:09:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Jun 2010 06:09:33 -0000 Received: (qmail 57792 invoked by uid 500); 12 Jun 2010 06:02:52 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 57381 invoked by uid 500); 12 Jun 2010 06:02:50 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 57365 invoked by uid 99); 12 Jun 2010 06:02:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Jun 2010 06:02:49 +0000 X-ASF-Spam-Status: No, hits=1.1 required=10.0 tests=AWL,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Jun 2010 06:02:44 +0000 Received: by pvg4 with SMTP id 4so1856413pvg.35 for ; Fri, 11 Jun 2010 23:02:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.249.4 with SMTP id w4mr2030272wfh.171.1276322543551; Fri, 11 Jun 2010 23:02:23 -0700 (PDT) Received: by 10.142.207.2 with HTTP; Fri, 11 Jun 2010 23:02:23 -0700 (PDT) In-Reply-To: <3120E6F5005EE7419C125CE166D55E9007770D991C@SC-MBXC1.TheFacebook.com> References: <2163283F81CD414BB8216E2F4827FFD6078B1072BD@SC-MBXC1.TheFacebook.com> <3120E6F5005EE7419C125CE166D55E9007770D991C@SC-MBXC1.TheFacebook.com> Date: Fri, 11 Jun 2010 23:02:23 -0700 Message-ID: Subject: Re: Is anybody working on the globally "order by" of hive ? From: Jeff Hammerbacher To: hive-user@hadoop.apache.org Cc: "hive-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=00504502ce8ad3c5960488ceffa4 --00504502ce8ad3c5960488ceffa4 Content-Type: text/plain; charset=ISO-8859-1 See https://issues.apache.org/jira/browse/HIVE-1402. On Fri, Jun 11, 2010 at 1:22 PM, John Sichi wrote: > If someone is interested in adding parallel ORDER BY to Hive (using > TotalOrderPartitioner), here's a good starting point: > > http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad > > The goal would be to take that manual two-step sample-then-sort process and > turn it into an automatic plan within Hive. I have a better example for the > sampling query which I haven't published yet. > > We would also need to name the final output files in such a way that the > total order could be iterated via the filenames. > --00504502ce8ad3c5960488ceffa4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable See https://iss= ues.apache.org/jira/browse/HIVE-1402.

On Fri, Jun 11, 2010 at 1:22 PM, John Sichi <jsichi@facebook.com> wrote:
If someone is interested in adding parallel ORDER BY to Hiv= e (using TotalOrderPartitioner), here's a good starting point:

http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

The goal would be to take that manual two-step sample-then-sort process and= turn it into an automatic plan within Hive. =A0I have a better example for= the sampling query which I haven't published yet.

We would also need to name the final output files in such a way that the to= tal order could be iterated via the filenames.

--00504502ce8ad3c5960488ceffa4--