hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <>
Subject Hive alter table concatenate loses data - can parquet help?
Date Mon, 07 Mar 2016 23:25:56 GMT
Hi All,

Following on from from our parquet vs orc discussion, today I observed
hive's alter table ... concatenate command remove rows from an ORC
formatted table.

1. Has anyone else observed this (fuller description below)? And
2. How to do parquet users handle the file fragmentation issue?

Description of the problem:

Today I ran a query to count rows by date. Relevant days below:
2016-02-28 16866
2016-03-06 219
2016-03-07 2863
I then ran concatenation on that table. Rerunning the same query resulted

2016-02-28 16866
2016-03-06 219
2016-03-07 1158

Note reduced count for 2016-03-07

I then ran concatenation a second time, and the query a third time:
2016-02-28 16344
2016-03-06 219
2016-03-07 1158

Now the count for 2016-02-28 is reduced.

This doesn't look like an elimination of duplicates occurring by design -
these didn't all happen on the first run of concatenation. It looks like
concatenation just kind of loses data.

Want to work at Handy? Check out our culture deck and open roles 
Latest news <> at Handy
Handy just raised $50m 
by Fidelity

View raw message