hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <pjayachand...@hortonworks.com>
Subject Re: Hive alter table concatenate loses data - can parquet help?
Date Tue, 15 Mar 2016 02:42:26 GMT
Hi Marcin

I came across this issue recently. Do you have old orc files (created with hive 0.11) in the
table/partition? If so this patch is required

https://issues.apache.org/jira/browse/HIVE-13285

Thanks
Prasanth

On Mar 10, 2016, at 5:02 PM, Prasanth Jayachandran <pjayachandran@hortonworks.com<mailto:pjayachandran@hortonworks.com>>
wrote:

After hive 1.2.1 there is one patch that went in related to alter table concatenation. https://issues.apache.org/jira/browse/HIVE-12450

I am not sure if its related though. Could you please file a bug for this? It will be great
if you can attach a small enough repro for this issue. I can verify it and provide a fix in
case of bug.

Thanks
Prasanth

On Mar 8, 2016, at 5:52 AM, Marcin Tustin <mtustin@handybook.com<mailto:mtustin@handybook.com>>
wrote:

Hi Mich,

ddl as below.

Hi Prasanth,

Hive version as reported by Hortonworks is 1.2.1.2.3.

Thanks,
Marcin


CREATE TABLE `<tablename>`(

  `col1` string,

  `col2` bigint,

  `col3` string,

  `col4` string,

  `col4` string,

  `col5` bigint,

  `col6` string,

  `col7` string,

  `col8` string,

  `col9` string,

  `col10` boolean,

  `col11` boolean,

  `col12` string,

  `metadata` struct<file:string,hostname:string,level:string,line:bigint,logger:string,method:string,millis:bigint,pid:bigint,timestamp:string>,

  `col14` string,

  `col15` bigint,

  `col16` double,

  `col17` bigint)

ROW FORMAT SERDE

  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

STORED AS INPUTFORMAT

  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

OUTPUTFORMAT

  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

LOCATION

  'hdfs://reporting-handy/<path>'<hdfs://reporting-handy/%3Cpath%3E'>

TBLPROPERTIES (

  'COLUMN_STATS_ACCURATE'='true',

  'numFiles'='2800',

  'numRows'='297263',

  'rawDataSize'='454748401',

  'totalSize'='31310353',

  'transient_lastDdlTime'='1457437204')

Time taken: 1.062 seconds, Fetched: 34 row(s)

On Tue, Mar 8, 2016 at 4:29 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com<mailto:mich.talebzadeh@gmail.com>>
wrote:
Hi

can you please provide DDL for this table "show create table <TABLE>"

Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 7 March 2016 at 23:25, Marcin Tustin <mtustin@handybook.com<mailto:mtustin@handybook.com>>
wrote:
Hi All,

Following on from from our parquet vs orc discussion, today I observed hive's alter table
... concatenate command remove rows from an ORC formatted table.

1. Has anyone else observed this (fuller description below)? And
2. How to do parquet users handle the file fragmentation issue?

Description of the problem:

Today I ran a query to count rows by date. Relevant days below:
2016-02-28 16866
2016-03-06 219
2016-03-07 2863
I then ran concatenation on that table. Rerunning the same query resulted in:

2016-02-28 16866
2016-03-06 219
2016-03-07 1158

Note reduced count for 2016-03-07

I then ran concatenation a second time, and the query a third time:
2016-02-28 16344
2016-03-06 219
2016-03-07 1158

Now the count for 2016-02-28 is reduced.

This doesn't look like an elimination of duplicates occurring by design - these didn't all
happen on the first run of concatenation. It looks like concatenation just kind of loses data.



Want to work at Handy? Check out our culture deck and open roles<http://www.handy.com/careers>
Latest news<http://www.handy.com/press> at Handy
Handy just raised $50m<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led by Fidelity

[http://marketing-email-assets.handybook.com/smalllogo.png]



Want to work at Handy? Check out our culture deck and open roles<http://www.handy.com/careers>
Latest news<http://www.handy.com/press> at Handy
Handy just raised $50m<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led by Fidelity

[http://marketing-email-assets.handybook.com/smalllogo.png]



Mime
View raw message