hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MiaoMiao <liy...@gmail.com>
Subject Re: How to run big queries in optimized way ?
Date Fri, 21 Sep 2012 02:40:34 GMT
Hive implements a format named RCFILE, which could gain better
performance, but in my project, it just ties with the plain-text
format.

Hive also have an index feature, but not so convenient or practical.

I think the best way to optimized is still reusing the same source
tables, avoiding sub-queries, and merge HiveQL as many as possible.
On Fri, Sep 21, 2012 at 10:30 AM, Mapred Learn <mapred.learn@gmail.com> wrote:
> Hi,
> We have datasets which are about 10-15 TB in size.
>
> We want to run hive queries on top of this input data.
>
> What are ways to reduce stress on our cluster for running many such big queries( include
joins too) in parallel ?
> How to enable compression etc for intermediate hive output ?
> How to make job cache does not go to high etc ?
> In short , best practices for huge queries on hive ?
>
> Any inputs are really appreciated !
>
> Thanks,
> JJ
>
> Sent from my iPhone

Mime
View raw message