Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B933510D7F for ; Tue, 8 Mar 2016 09:36:01 +0000 (UTC) Received: (qmail 54589 invoked by uid 500); 8 Mar 2016 09:35:59 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 54517 invoked by uid 500); 8 Mar 2016 09:35:59 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 54505 invoked by uid 99); 8 Mar 2016 09:35:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2016 09:35:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 09390C002C for ; Tue, 8 Mar 2016 09:35:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.799 X-Spam-Level: ** X-Spam-Status: No, score=2.799 tagged_above=-999 required=6.31 tests=[FSL_HELO_BARE_IP_2=1.499, HTML_MESSAGE=2, LOTS_OF_MONEY=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id doomH_BB08Q5 for ; Tue, 8 Mar 2016 09:35:56 +0000 (UTC) Received: from relayvx12c.securemail.intermedia.net (relayvx12c.securemail.intermedia.net [64.78.52.187]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id E053D5F1B3 for ; Tue, 8 Mar 2016 09:35:55 +0000 (UTC) Received: from securemail.intermedia.net (localhost [127.0.0.1]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by emg-ca-1-2.localdomain (Postfix) with ESMTPS id C9DD553E52 for ; Tue, 8 Mar 2016 01:35:54 -0800 (PST) Subject: Re: Hive alter table concatenate loses data - can parquet help? MIME-Version: 1.0 x-echoworx-msg-id: b33d4457-4d93-4097-b5e8-7cb98af59ffa x-echoworx-emg-received: Tue, 8 Mar 2016 01:35:54.711 -0800 x-echoworx-message-code-hashed: dc8b958b06f1c1444b55dabae69b293234caab4e57e3702b6967a89ad8c00d0c x-echoworx-action: delivered Received: from 10.254.155.17 ([10.254.155.17]) by emg-ca-1-2 (JAMES SMTP Server 2.3.2) with SMTP ID 370 for ; Tue, 8 Mar 2016 01:35:54 -0800 (PST) Received: from MBX080-W3-CO-2.exch080.serverpod.net (unknown [10.224.117.53]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by emg-ca-1-2.localdomain (Postfix) with ESMTPS id 77F3753E52 for ; Tue, 8 Mar 2016 01:35:54 -0800 (PST) Received: from MBX080-W3-CO-2.exch080.serverpod.net (10.224.117.53) by MBX080-W3-CO-2.exch080.serverpod.net (10.224.117.53) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Tue, 8 Mar 2016 01:35:53 -0800 Received: from MBX080-W3-CO-2.exch080.serverpod.net ([10.224.117.53]) by mbx080-w3-co-2.exch080.serverpod.net ([10.224.117.53]) with mapi id 15.00.1130.005; Tue, 8 Mar 2016 01:35:53 -0800 From: Prasanth Jayachandran To: "user@hive.apache.org" Thread-Topic: Hive alter table concatenate loses data - can parquet help? Thread-Index: AQHReMjDDiVc480fRkSh+wM6rkCUUp9PzpAAgAABy4A= Date: Tue, 8 Mar 2016 09:35:53 +0000 Message-ID: <52BFA72B-75A3-4F85-B290-1A88F2A75DC0@hortonworks.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [192.208.55.186] x-source-routing-agent: Processed Content-Type: multipart/alternative; boundary="_000_52BFA72B75A34F85B2901A88F2A75DC0hortonworkscom_" --_000_52BFA72B75A34F85B2901A88F2A75DC0hortonworkscom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Marcin What hive version are you using? There has been some fixes to concatenate l= ately. I will let you know if your hive version contains all fixes. Thanks Prasanth On Mar 8, 2016, at 3:29 AM, Mich Talebzadeh > wrote: Hi can you please provide DDL for this table "show create table " Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6z= P6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 7 March 2016 at 23:25, Marcin Tustin > wrote: Hi All, Following on from from our parquet vs orc discussion, today I observed hive= 's alter table ... concatenate command remove rows from an ORC formatted ta= ble. 1. Has anyone else observed this (fuller description below)? And 2. How to do parquet users handle the file fragmentation issue? Description of the problem: Today I ran a query to count rows by date. Relevant days below: 2016-02-28 16866 2016-03-06 219 2016-03-07 2863 I then ran concatenation on that table. Rerunning the same query resulted i= n: 2016-02-28 16866 2016-03-06 219 2016-03-07 1158 Note reduced count for 2016-03-07 I then ran concatenation a second time, and the query a third time: 2016-02-28 16344 2016-03-06 219 2016-03-07 1158 Now the count for 2016-02-28 is reduced. This doesn't look like an elimination of duplicates occurring by design - t= hese didn't all happen on the first run of concatenation. It looks like con= catenation just kind of loses data. Want to work at Handy? Check out our culture deck and open roles Latest news at Handy Handy just raised $50m led by Fidelity [http://marketing-email-assets.handybook.com/smalllogo.png] --_000_52BFA72B75A34F85B2901A88F2A75DC0hortonworkscom_ Content-Type: text/html; charset="us-ascii" Content-ID: <8C1518640F16D648AC1A0A83A634C887@exch080.serverpod.net> Content-Transfer-Encoding: quoted-printable Hi Marcin

What hive version are you using? There has been some fixes = to concatenate lately. I will let you know if your hive version contains al= l fixes.

Thanks
Prasanth

On Mar 8, 2016, at 3:29 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com&= gt; wrote:

Hi

can you please provide DDL for this table "show create= table <TABLE>"


On 7 March 2016 at 23:25, Marcin Tustin <m= tustin@handybook.com> wrote:
Hi All,

Following on from from our parquet vs orc discussion, today= I observed hive's alter table ... concatenate command remove rows from an = ORC formatted table. 

1. Has anyone else observed this (fuller description below)= ? And 
2. How to do parquet users handle the file fragmentation is= sue?

Description of the problem:

Today I ran a query to count rows by date. Relevant days be= low:
2016-02-28 = 16866
2016-03-06 = 219
2016-03-07 = 2863
I then ran concatenation on that table. Rerunning the same = query resulted in:

2016-02-28 = 16866
2016-03-06 = 219
2016-03-07 = 1158

Note reduced count for 2016-03-07

I then ran concatenation a second time, and the query a thi= rd time:
2016-02-28 = 16344
2016-03-06 = 219
2016-03-07 = 1158

Now the count for 2016-02-28 is reduced.

This doesn't look like an elimination of duplicates occurri= ng by design - these didn't all happen on the first run of concatenation. I= t looks like concatenation just kind of loses data.



Want to work at Handy? Check o= ut our culture deck and open roles
Latest news at Handy
Handy just raised $50m led by Fidelity



--_000_52BFA72B75A34F85B2901A88F2A75DC0hortonworkscom_--