Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7C9CE199E2 for ; Sun, 10 Apr 2016 12:03:19 +0000 (UTC) Received: (qmail 59785 invoked by uid 500); 10 Apr 2016 12:03:12 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 59720 invoked by uid 500); 10 Apr 2016 12:03:12 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 59710 invoked by uid 99); 10 Apr 2016 12:03:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Apr 2016 12:03:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 2B424180361 for ; Sun, 10 Apr 2016 12:03:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.429 X-Spam-Level: * X-Spam-Status: No, score=1.429 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id s973llxsdGDB for ; Sun, 10 Apr 2016 12:03:08 +0000 (UTC) Received: from mail-oi0-f42.google.com (mail-oi0-f42.google.com [209.85.218.42]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 4652A5F39A for ; Sun, 10 Apr 2016 12:03:08 +0000 (UTC) Received: by mail-oi0-f42.google.com with SMTP id p188so179992439oih.2 for ; Sun, 10 Apr 2016 05:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=Bui2fjXbRAVubSKw04JJj4VykUuWEZXR42lKkL5XXa8=; b=X+Pw8w/yRdQFnlhLJPHLYCQBzdbgGYAjiVTpfneGM1Lzw5YUl/uEa+dKeJU4BOuQ+a YBqqhIxlVi32Q9j77aReJAgHeEJlv01n/qicrc1zhnDYU3kubfdtuGGJUiaYJqyfrBva KIk0L8OtzYHqq+Kr4OjGQcERRef4zBSui9/F/1Dh53Vw7U8VR60tFh0FiKfwp9Zd6bui NWGQdqwmnR6W67Jj2FoU+E+uhgxSpEm4V+zU/U4KO+QM9JSCXt4qaobwRJkW2LWiL87g xpSn1wPHi2o6VU1NUuCaZcGVduMY/O2VJdSV4KIYs3N5EOy/G7m2oePFEu33nfr4H5Uj RcTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=Bui2fjXbRAVubSKw04JJj4VykUuWEZXR42lKkL5XXa8=; b=ZD2tMzDlya7jpo+Zg7fDmYTB8i1xvdVNX9CNdIchk+7Qz73bM12tZYaeUrHPhwZtS/ NaIsRWhKvdugBBKpFGJCC5fbojH+WL2SKva0R4tkDPYqnh6WNss9ulepuNQwq+Qc6uFI 7fikxC5uh3cEhCCz/6pfHgDe8m5dfDVg2fHHUQG6jrl/XGI3ivRInCpHJld2YhARFHZ4 eEk3RB27hbPsYlLQ1tzNDrQKeEBpzzBticf4TS9Sv8924F8XG6epAV4KNGG+fIFZepfE eXnkRtlHaCUhP+B3rYwjHgyuF1LtClk9VPKwh2Vbyo+kMusN4NOtUXkcIZS3ixOmhkar OdNg== X-Gm-Message-State: AD7BkJLYvaXo2pzuBG9VbmFx2/VO1ro8WnqALQwErIPGTzk7HL63QeXd4RgmGoA7mOhAX3IkNo+cPC8di6LRwQ== MIME-Version: 1.0 X-Received: by 10.157.7.226 with SMTP id 89mr7427408oto.182.1460289781046; Sun, 10 Apr 2016 05:03:01 -0700 (PDT) Received: by 10.157.63.38 with HTTP; Sun, 10 Apr 2016 05:03:00 -0700 (PDT) In-Reply-To: References: Date: Sun, 10 Apr 2016 21:03:00 +0900 Message-ID: Subject: Re: ORC file sort order .. From: no jihun To: user@hive.apache.org Content-Type: multipart/alternative; boundary=94eb2c04ff9cab255605302035c7 --94eb2c04ff9cab255605302035c7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable You can enforce to insert sorted data into *SORTED BY *table by set hive.enforce.sorting=3Dtrue https://github.com/apache/hive/blob/branch-1.2/common/src/java/org/apache/h= adoop/hive/conf/HiveConf.java#L1131 but this configuration seems removed by 2.0 https://issues.apache.org/jira/browse/HIVE-12331 2016-04-10 1:41 GMT+09:00 Mich Talebzadeh : > Have you tried bucketing by the column plus setting orce,create.index and > orc.bloom.filter.columns > > CREATE TABLE dummy ( > ID INT > , CLUSTERED INT > , SCATTERED INT > , RANDOMISED INT > , RANDOM_STRING VARCHAR(50) > , SMALL_VC VARCHAR(10) > , PADDING VARCHAR(10) > ) > > *CLUSTERED BY (ID) INTO 256 BUCKETS*STORED AS ORC > TBLPROPERTIES ( > > > *"orc.create.index"=3D"true","orc.bloom.filter.columns"=3D"ID","* > orc.bloom.filter.fpp"=3D"0.05", > "orc.compress"=3D"SNAPPY", > "orc.stripe.size"=3D"16777216", > "orc.row.index.stride"=3D"10000" ) > ; > > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJ= d6zP6AcPCCdOABUrV8Pw > * > > > > http://talebzadehmich.wordpress.com > > > > On 9 April 2016 at 01:53, Gautam wrote: > >> Hey, >> >> This might be too obvious a question but I haven't found a wa= y >> to validate ordering in an ORC file. I need each file to be ordered by a >> column, Is there a sure shot way of ensuring the sort order in an ORC fi= le >> is as I expect it? >> >> The closest i'v come to is using the hive --orcfiledump --rowindex >> which prints that columns min/max values in the index. But that= is >> still not saying if the data within the stripes is sorted. >> >> Cheers, >> -Gautam. >> > > --=20 ---------------------------------------------- Jihun No ( =EB=85=B8=EC=A7=80=ED=9B=88 ) ---------------------------------------------- Twitter : @nozisim Facebook : nozisim Website : http://jeesim2.godohosting.com ---------------------------------------------------------------------------= ------ Market Apps : android market products. --94eb2c04ff9cab255605302035c7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
You can enforce to insert sorted data into SORTED BY table by set=C2=A0hive.enforce.= sorting=3Dtrue

<= div>but this configuration seems removed by 2.0

2016-04-10 1:41 GMT+09:00 Mich Talebzadeh <mic= h.talebzadeh@gmail.com>:
Have you tried bucketing by the column plus setting orc= e,create.index and orc.bloom.filter.columns

CREATE TABLE dummy (
=C2= =A0=C2=A0=C2=A0=C2=A0 ID INT
=C2=A0=C2=A0 , CLUSTERED INT
=C2=A0=C2= =A0 , SCATTERED INT
=C2=A0=C2=A0 , RANDOMISED INT
=C2=A0=C2=A0 , RAND= OM_STRING VARCHAR(50)
=C2=A0=C2=A0 , SMALL_VC VARCHAR(10)
=C2=A0=C2= =A0 , PADDING=C2=A0 VARCHAR(10)
)
CLUSTERED BY (ID) INTO 256 = BUCKETS
STORED AS ORC
TBLPROPERTIES (
"orc.c= reate.index"=3D"true",
"orc.bloom.filter.columns&quo= t;=3D"ID",
"
orc.bloom.filter.fpp"=3D"0= .05",
"orc.compress"=3D"SNAPPY",
"orc.s= tripe.size"=3D"16777216",
"orc.row.index.stride"= ;=3D"10000" )
;


HTH



On 9 April 2016 at 01:53, Gautam <g= autamkowshik@gmail.com> wrote:
Hey,

=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0This might be too obvious a question but I haven= 9;t found a way to validate ordering in an ORC file. I need each file to be= ordered by a column, Is there a sure shot way of ensuring the sort order i= n an ORC file is as I expect it?

The closest i'= ;v come to is using the hive --orcfiledump --rowindex <col_id> which = prints that columns min/max values in the index. But that is still not sayi= ng if the data within the stripes is sorted.

Cheer= s,
-Gautam.




--
---------------= -------------------------------
Jihun No ( =EB=85=B8=EC=A7=80=ED=9B= =88 )
------------------------------------= ----------
Twitter=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 : @nozisim
Facebook=C2=A0=C2=A0=C2=A0=C2=A0 =C2= =A0 : nozisim
Website=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 =C2=A0: http://jeesim2.godohost= ing.com
----------------------------------------------------------------------= -----------
Market Apps=C2=A0=C2=A0 : android market pro= ducts.
--94eb2c04ff9cab255605302035c7--