Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC40E1723C for ; Sat, 7 Mar 2015 23:05:36 +0000 (UTC) Received: (qmail 57161 invoked by uid 500); 7 Mar 2015 23:05:35 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 57093 invoked by uid 500); 7 Mar 2015 23:05:35 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 57083 invoked by uid 99); 7 Mar 2015 23:05:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2015 23:05:35 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of oracle.blog3@gmail.com designates 209.85.213.181 as permitted sender) Received: from [209.85.213.181] (HELO mail-ig0-f181.google.com) (209.85.213.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2015 23:05:29 +0000 Received: by igbhn18 with SMTP id hn18so12549670igb.2 for ; Sat, 07 Mar 2015 15:02:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=3BZrhj05/eOMLEJOP0W4RsBTvCzi9AvRByNl9IpuTN4=; b=G/vZQcLD3bq5f5V3icL3wOA4qYy3c97yu1RffHNY9G8ZeuZdEUSFNkAx1N3ibOqDw4 diCRfche2TskxbprxEAXvh1YmaQ9HktY2eroL8Sh3j3ccV6FY1w3uPuHln49X//Pp+qm lj0uCzvh3IsFPFpx9sLSeA7oRLYnTiMeGfYtbVqVnDT5+UGRL2uAZ5W4yhZhKTFnswgd Ti4UqBUdpd9GPh8myFSGlXW2i9QhX6JRbkOTobYQLXVmPiRDtxFnJ6z82gin2Yh1EQta F2mXs2SFU5bTTJU0ub00kj9Ge0uIxpVMLJfScVUqj3QlOq29VlkZxxjXhAmg99FAX6KT svbg== MIME-Version: 1.0 X-Received: by 10.42.100.73 with SMTP id z9mr17958708icn.79.1425769372987; Sat, 07 Mar 2015 15:02:52 -0800 (PST) Received: by 10.107.17.145 with HTTP; Sat, 7 Mar 2015 15:02:52 -0800 (PST) Date: Sat, 7 Mar 2015 17:02:52 -0600 Message-ID: Subject: sorting in hive -- general From: max scalf To: HDP mailing list , Hive Mailing List Content-Type: multipart/alternative; boundary=20cf301b60e30236d50510bacdc6 X-Virus-Checked: Checked by ClamAV on apache.org --20cf301b60e30236d50510bacdc6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello all, I am a new to hadoop and hive in general and i am reading "hadoop the definitive guide" by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng *Sorting and Aggregating* *Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY performs a parallel total sort of the input (like that described in =E2=80=9CTotal Sort=E2=80=9D on page 261). When a globally sorted result= is not required=E2=80=94and in many cases it isn=E2=80=99t=E2=80=94you can use Hiv= e=E2=80=99s nonstandard extension, SORT BY, instead. SORT BY produces a sorted file per reducer.* My Questions is, what exactly does he mean by "globally sorted result"?, if the sort by operation produces a sorted file per reducer does that mean at the end of the sort all the reducer are put back together to give the correct results ? --20cf301b60e30236d50510bacdc6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hel= lo all,

I am a new to hadoop and hive in general and i am reading "hadoo= p the definitive guide" by Tom White and on page 504 for the hive chap= ter, Tom says below with regards to soritng

Sorting and Aggregating
Sorting data in Hive can be achieved by = using a standard ORDER BY clause. ORDER BY performs a parallel total sort o= f the input (like that described in =E2=80=9CTotal Sort=E2=80=9D on page 26= 1). When a globally sorted result is not required=E2=80=94and in many cases= it isn=E2=80=99t=E2=80=94you can use Hive=E2=80=99s nonstandard extension,= SORT BY, instead. SORT BY produces a sorted file per reducer.

=C2=A0
My Questions is, what exactly does he mean b= y "globally sorted result"?, if the sort by operation produces a = sorted file per reducer does that mean at the end of the sort all the reduc= er are put back together to give the correct results ?



--20cf301b60e30236d50510bacdc6--