hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nishant Aggarwal <nishant....@gmail.com>
Subject Re: Compaction in hive
Date Wed, 07 Dec 2016 17:58:19 GMT
Hi Allan,

Good Morning

Thanks for your reply.

We have lots of external tables with parquet and gzip format.
Data will be pushed to those tables on regular interval with volume close
to 10-20GB/per day.
Our concern is that this process will generate lots of small files in the
tables. We are searching out a way to merge or concatenate these files so
that the number of files present in each month partition should be less.

Let me know your view point on what could be the best way forward to
achieve this without compromising the table performance.

One quick question, concatenation only works on orc/rc format or is it
applicable to parquet as well.

Thanks again for your comments.


Best,
Nishant




Thanks and Regards
Nishant Aggarwal, PMP
Cell No:- +91 99588 94305
http://in.linkedin.com/pub/nishant-aggarwal/53/698/11b


On Wed, Dec 7, 2016 at 12:32 AM, Alan Gates <alanfgates@gmail.com> wrote:

> What exactly do you mean by compaction?  Hive has a compactor that runs
> over ACID tables to handle the delta files[1], but I’m guessing you don’t
> mean that.  Are you wanting to concatenate files in existing tables?  The
> usual way to do that is alter table concatenate[2].  Or do you mean
> something else?
>
> Alan.
>
> 1. see https://cwiki.apache.org/confluence/display/Hive/Hive+
> Transactions#HiveTransactions-Compactor
> 2. see https://cwiki.apache.org/confluence/display/Hive/
> LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate
>
> > On Dec 6, 2016, at 07:03, Nishant Aggarwal <nishant.agg@gmail.com>
> wrote:
> >
> > Dear Hive Gurus,
> >
> > I am looking to some practical solution on how to implement Compaction
> in Hive. Hiveserver2 version 1.1.0.
> >
> > We have some external Hive tables on which we  need to implement
> Compaction.
> >
> > Merging the map files is one option which is turned down since it is
> very CPU intensive.
> >
> > Need your help in order to implement Compaction, how to implement, what
> are the pros and cons.
> >
> > Also, is it mandatory to have bucketing to implement compaction?
> >
> > Request you to please help.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Thanks and Regards
> > Nishant Aggarwal, PMP
> > Cell No:- +91 99588 94305
> > http://in.linkedin.com/pub/nishant-aggarwal/53/698/11b
> >
>
>

Mime
View raw message