hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From no jihun <>
Subject ORC compaction not happen.
Date Fri, 08 Apr 2016 14:01:13 GMT

Does anyone can give me some advice?

I am trying to make this scenario work.

A. create orc, bucketed table.

  create table_orc ( field1, field2 )
  clustered by (field1, field2) into 64 buckets
  stored as ORC

B. add rows to table_orc *HOURLY.*

  insert into table_orc
  select * from hourly_row_2016040821
  distribute by (field1, field2)

# after create table by query A
# then run query B (once)

there exists one file per bucket.
[image: 본문 이미지 1]

Now one hour later
I run query B again to import the next hour's data into same table

  insert into table_orc
  select * from hourly_row_2016040822
  distribute by (field1, field2)

I expected there may be some transaction files, delta files .
  like orc document says.(
  [image: 본문 이미지 2]

But there only found XXXX_copy_i files.
  [image: 본문 이미지 3]

and compaction never happens.

This is ACID settings on ambari.
  [image: 본문 이미지 4]

Is this expected result?

How can I run multiple insert into X select from Y
and keep one file per bucket by compaction?

No way by insert query?

Any advice will be appreciated.

Thank you.

View raw message