hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From no jihun <jees...@gmail.com>
Subject ORC compaction not happen.
Date Fri, 08 Apr 2016 14:01:13 GMT
Hello.

Does anyone can give me some advice?

I am trying to make this scenario work.

A. create orc, bucketed table.

  create table_orc ( field1, field2 )
  clustered by (field1, field2) into 64 buckets
  stored as ORC


B. add rows to table_orc *HOURLY.*

  insert into table_orc
  select * from hourly_row_2016040821
  distribute by (field1, field2)


# after create table by query A
# then run query B (once)

there exists one file per bucket.
[image: 본문 이미지 1]


Now one hour later
I run query B again to import the next hour's data into same table

  insert into table_orc
  select * from hourly_row_2016040822
  distribute by (field1, field2)


I expected there may be some transaction files, delta files .
  like orc document says.(https://orc.apache.org/docs/acid.html)
  [image: 본문 이미지 2]


But there only found XXXX_copy_i files.
  [image: 본문 이미지 3]

and compaction never happens.

This is ACID settings on ambari.
  [image: 본문 이미지 4]


Is this expected result?

How can I run multiple insert into X select from Y
and keep one file per bucket by compaction?

No way by insert query?


Any advice will be appreciated.

Thank you.

Mime
View raw message