hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerome Banks <jer...@klout.com>
Subject Re: Concurrency in hive
Date Thu, 21 Jun 2012 17:17:32 GMT
set hive.exec.parallel=true;

This will run Hive jobs in parallel, if they are able to do so.

As for multi-threading in the actual job itself, I don't think so, but I'm
not sure. The query planner will merge steps together, in order to try to
minimize the number of MR jobs needed to run a query, but I think those are
chained together in a single thread, both on the mapper and reduce.

When I was at Quantcast, we had some multi-threading in the mapper ands
reducers, to try to increase throughput, by utilizing the CPU when the job
would otherwise be blocked on IO.  This helps out, if your IO is very slow,
but if the IO no longer becomes a bottleneck, then you spend a lot of time
context-switching, and it no longer efficient.

Interesting question, I'll look into it some more. Let me know if you find
out anything.

-- jerome

On Thu, Jun 21, 2012 at 1:16 AM, Jayanth Muthya <jayanthmuthya@gmail.com>wrote:

> Hi,
> I was looking into some of the source code for hive. And had a few
> questions regarding parallelism in hive. Can a map task in
> hive exploit parallelism and run multiple threads? If it can do that, does
> it do it by default? or does a user have to configure the settings?
> This question seems really basic, I just started looking into hadoop/hive.
> Thanks in advance!
>
> -Jay
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message