hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Jagannathan <Ravi.Jagannat...@nominum.com>
Subject How to decrease the number of Mappers (not reducers) ?
Date Tue, 25 Aug 2009 20:08:25 GMT


There are too many mappers in Hive. Table has approximately 50K rows, number of bytes = 5,654,500.
the query is select count(1) from TABLE group by COLUMN
There are only 2 nodes.
On the Web UI I can see there are 1001 maps spawned, each of which takes 1 sec to run. There
are only 2 mappers running at a time, this means 10001 = 15 minutes seconds to run which is
unacceptable.
Thereafter the reduce> copy takes another 10 minutes. The reducers reduce>reduce finished
very fast. How can I reduce the number of maps.

Things I tried:
I tried changing the hadoop-site.xml and restarting hive and hadoop server. But the map parameters
mapred.map.tasks which I changed are not showing up in job.xml - as if Hive suppressed these
changes. The python hive client does not allow a set command. I tried the cli set, but that
has no effect either.
Hadoop-0.19.1, hive 0.3

Mime
View raw message