hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Improving query performance on hive and hdfs
Date Wed, 05 Sep 2012 07:32:17 GMT
It would be interesting to know the size of the data, how it is stored
within Hive and what kind of query you run on it.

Typically, 90 000 000 records could be less than 64 Mo and could even be
all loaded into memory. In that case, yes, it is not astonishing that
alternatives could outperform Hadoop.

If you are using regexes in order to parse the line (row format), there
could be a point of improvement there.

Then again depending on the query (multiple joins? group by?), that could
have a huge impact too.

Regards

Bertrand

On Wed, Sep 5, 2012 at 8:28 AM, MiaoMiao <liy099@gmail.com> wrote:

> Your store 90 million records in DB? What kind?
>
> Sure there are some optimizations to speed up hive query, but I don't
> see a universal one, except adding more servers.
>
> On Wed, Sep 5, 2012 at 2:19 PM, iwannaplay games
> <funnlearnforkids@gmail.com> wrote:
> > Hi all,
> >
> > I ran a query on hive on top of 90 million records that took 12 minutes
> to
> > execute and same query on sql server took 8 minutes.My question is how
> can i
> > make hadoop's performance better.What all configurations will improve the
> > latency?
> >
> > Thanks & Regards
> > Prabhjot
>



-- 
Bertrand Dechoux

Mime
View raw message