hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <j.prasant...@gmail.com>
Subject Re: Hive 12 - CDH 5.0.1 - many small files when using ORC table
Date Tue, 18 Aug 2015 15:35:40 GMT
Are you using bucketing? If so those are empty ORC files without any data containing only metadata
information. 


_____________________________
From: Juraj jiv <fatcap.jiv@gmail.com>
Sent: Tuesday, August 18, 2015 8:28 AM
Subject: Hive 12 - CDH 5.0.1 - many small files when using ORC table
To:  <user@hive.apache.org>



 
  
 
 
  
   
    
     Hello all,
     

     
i have question about ORC table format. We use it as for our datastore tables but during maintenance
i noticed there is many small files inside tables which I presume doesn't contains any data.
They are only 43bytes in size and they takes around 70% of all files inside table folder.
     

     

    
    
     For example (grep 43 bytes is size and other):
     

     

    
    
     hadoop@hadoopnn:~$ hdfs dfs -du -h /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30
| grep "^43 " | wc -l
     
7448
     
hadoop@hadoopnn:~$ hdfs dfs -du -h /user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30
| grep -v "^43 " | wc -l
     
4712
     

     

    
    
     Why is that? Why is there those many 43bytes files? 
     

     

    
    
     Ascii content of the files is, which i guess is just ORC header:
     

    
    
     0@▒▒▒"
     
      ▒▒ORC
     

    
    
     
hive version:
     
0.12.0+cdh5.0.1+315     1.cdh5.0.1.p0.31     CDH 5
     

     

    Thanks
    

   JV
Mime
View raw message