hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasco Visser <vasco.vis...@gmail.com>
Subject Re: Improving query performance on hive and hdfs
Date Wed, 05 Sep 2012 09:21:58 GMT
You know that Hadoop is not designed for low latency. To say anything
useful I think you should share some more details:

- What query are you launching (does it have join/group by)
- How many mappers/reducers and jobs does the query spawn
- How does your data look like
- Also what version of Hadoop are you running, etc

Some things that are applicable depending on the things above
- Check if you can partition your data so that Hive can do partition pruning.
- If your query has joins then look at
https://cwiki.apache.org/Hive/languagemanual-joins.html (bottom of
page) to see how to organize your data to let Hive do a map side join.
- Try to play with the config option
mapreduce.job.reduce.slowstart.completedmaps, this can help you if you
have a lot of idle reducers in the map phase.
- I would try to limit the number of task per node to the number of
CPUs on the system, but I don't know if this is common practice.

On Wed, Sep 5, 2012 at 8:19 AM, iwannaplay games
<funnlearnforkids@gmail.com> wrote:
> Hi all,
> I ran a query on hive on top of 90 million records that took 12 minutes to
> execute and same query on sql server took 8 minutes.My question is how can i
> make hadoop's performance better.What all configurations will improve the
> latency?
> Thanks & Regards
> Prabhjot

View raw message