hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Compaction - get compacted files
Date Thu, 13 Apr 2017 17:04:29 GMT
Answers inline.

Alan.

> On Mar 29, 2017, at 03:08, Riccardo Iacomini <riccardo.iacomini@rdslab.com> wrote:
> 
> Hello,
> I have some questions about the compaction process. I need to manually trigger compaction
operations on a standard partitioned orc table (not ACID), and be able to get back the list
of compacted files. I could achieve this via HDFS, getting the directory listing and then
triggering the compaction, but will imply stopping the underlying processing to avoid new
files to be added in between. Here are some questions I could not answer myself from the material
I found online:
> 	• Is the compaction executed as a MapReduce job?
Yes.

> 
> 	• Is there a way to get back the list of compacted files?
No.  Note that even doing listing in HDFS will be somewhat confusing because production of
the new delta or base file (depending on whether it's a minor or major compaction) is decoupled
from removing the old delta and/or base files.  This is because readers may still be using
the old files, and the cleanup cannot be done until those readers have finished.

> 
> 	• How can you customize the compaction criteria?
You can modify when Hive decides to initiate compaction and how many resources it allocates
to compacting.  See https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions

Alan.

> Also, any link to documentation/material is really appreciated. 
> 
> Thank you all for your time.
> 
> Riccardo


Mime
View raw message