Dr Mich Talebzadeh
Hi All,Following on from from our parquet vs orc discussion, today I observed hive's alter table ... concatenate command remove rows from an ORC formatted table.1. Has anyone else observed this (fuller description below)? And2. How to do parquet users handle the file fragmentation issue?Description of the problem:Today I ran a query to count rows by date. Relevant days below:2016-02-28 168662016-03-06 2192016-03-07 2863I then ran concatenation on that table. Rerunning the same query resulted in:2016-02-28 168662016-03-06 2192016-03-07 1158Note reduced count for 2016-03-07I then ran concatenation a second time, and the query a third time:2016-02-28 163442016-03-06 2192016-03-07 1158Now the count for 2016-02-28 is reduced.This doesn't look like an elimination of duplicates occurring by design - these didn't all happen on the first run of concatenation. It looks like concatenation just kind of loses data.