hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Hive alter table concatenate loses data - can parquet help?
Date Tue, 08 Mar 2016 09:29:28 GMT
Hi

can you please provide DDL for this table "show create table <TABLE>"

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 March 2016 at 23:25, Marcin Tustin <mtustin@handybook.com> wrote:

> Hi All,
>
> Following on from from our parquet vs orc discussion, today I observed
> hive's alter table ... concatenate command remove rows from an ORC
> formatted table.
>
> 1. Has anyone else observed this (fuller description below)? And
> 2. How to do parquet users handle the file fragmentation issue?
>
> Description of the problem:
>
> Today I ran a query to count rows by date. Relevant days below:
> 2016-02-28 16866
> 2016-03-06 219
> 2016-03-07 2863
> I then ran concatenation on that table. Rerunning the same query resulted
> in:
>
> 2016-02-28 16866
> 2016-03-06 219
> 2016-03-07 1158
>
> Note reduced count for 2016-03-07
>
> I then ran concatenation a second time, and the query a third time:
> 2016-02-28 16344
> 2016-03-06 219
> 2016-03-07 1158
>
> Now the count for 2016-02-28 is reduced.
>
> This doesn't look like an elimination of duplicates occurring by design -
> these didn't all happen on the first run of concatenation. It looks like
> concatenation just kind of loses data.
>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
> by Fidelity
>
>

Mime
View raw message