hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juraj jiv <fatcap....@gmail.com>
Subject Hive 12 - CDH 5.0.1 - many small files when using ORC table
Date Tue, 18 Aug 2015 15:28:13 GMT
Hello all,

i have question about ORC table format. We use it as for our datastore
tables but during maintenance i noticed there is many small files inside
tables which I presume doesn't contains any data. They are only 43bytes in
size and they takes around 70% of all files inside table folder.

For example (grep 43 bytes is size and other):

hadoop@hadoopnn:~$ hdfs dfs -du -h
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
grep "^43 " | wc -l
7448
hadoop@hadoopnn:~$ hdfs dfs -du -h
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
grep -v "^43 " | wc -l
4712

Why is that? Why is there those many 43bytes files?

Ascii content of the files is, which i guess is just ORC header:
0@▒▒▒"
      ▒▒ORC

hive version:
0.12.0+cdh5.0.1+315     1.cdh5.0.1.p0.31     CDH 5

Thanks
JV

Mime
View raw message