hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juraj jiv <fatcap....@gmail.com>
Subject Re: Hive 12 - CDH 5.0.1 - many small files when using ORC table
Date Tue, 18 Aug 2015 15:46:36 GMT
Hi, yes i saw somewhere in sql scripts enabled bucketing adhoc via set
command - "hive.enforce.bucketing" + "hive.optimize.bucketmapjoin" . So
those metada information are required? I cant just delete those 43b files?

JV

On Tue, Aug 18, 2015 at 5:35 PM, Prasanth Jayachandran <
j.prasanth.j@gmail.com> wrote:

> Are you using bucketing? If so those are empty ORC files without any data
> containing only metadata information.
>
>
> _____________________________
> From: Juraj jiv <fatcap.jiv@gmail.com>
> Sent: Tuesday, August 18, 2015 8:28 AM
> Subject: Hive 12 - CDH 5.0.1 - many small files when using ORC table
> To: <user@hive.apache.org>
>
>
>
> Hello all,
>
> i have question about ORC table format. We use it as for our datastore
> tables but during maintenance i noticed there is many small files inside
> tables which I presume doesn't contains any data. They are only 43bytes in
> size and they takes around 70% of all files inside table folder.
>
> For example (grep 43 bytes is size and other):
>
> hadoop@hadoopnn:~$ hdfs dfs -du -h
> /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
> grep "^43 " | wc -l
> 7448
> hadoop@hadoopnn:~$ hdfs dfs -du -h
> /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
> grep -v "^43 " | wc -l
> 4712
>
> Why is that? Why is there those many 43bytes files?
>
> Ascii content of the files is, which i guess is just ORC header:
> 0@▒▒▒"
>       ▒▒ORC
>
> hive version:
> 0.12.0+cdh5.0.1+315     1.cdh5.0.1.p0.31     CDH 5
>
> Thanks
> JV
>
>
>

Mime
View raw message