hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: mapred.user.jobconf.limit question
Date Wed, 20 Jun 2012 20:12:09 GMT
If you have a query producing that many partitions it is probably a
bad idea. Consider using hive's bucketing or changing your
partitioning scheme.


On Wed, Jun 20, 2012 at 12:52 PM, Greg Fuller <> wrote:
> Hi,
> I sent this out to yesterday, but the hive mailing list
might be a better forum for this question.
> With CDH3u4 and Cloudera Manager, and I am running a hive query to repartition all of
our tables.  I'm reducing the number of partitions from 5 to 2, because the performance benefits
of a smaller mapred.input.dir is significant, which I only realized as our tables have grown
in size, and there was little perceived benefit from having the extra partitions considering
our typical queries.  After adjusting hive.exec.max.dynamic.partitions to deal with the enormous
number of partitions in our larger tables, I got this exception when running the conversion
> org.apache.hadoop.ipc.RemoteException: Exceeded
max jobconf size: 5445900 limit: 5242880
>       at org.apache.hadoop.mapred.JobTracker.submitJob(
>       at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>       at java.lang.reflect.Method.invoke(
>        …
> I solved this problem in a round-about way such that the query was successful, but I
don't understand why and would like to get a better grasp on what is going on.   I tried
these things:
> 1) In my hive query file, I added "set mapred.user.jobconf.limit=7000000;" before the
query, but I saw the exact same exception.
> 2) Since setting mapred.user.jobconf.limit from the CLI didn't seem to be working, I
used the safety valve for the jobtracker via cloudera manager to add this:
> <property>
>   <name>mapred.user.jobconf.limit</name>
>   <value>7000000</value>
> </property>
> and then I saved those changes, restarted the job tracker, and reran the query.  I saw
the same exception.
> Digging further, I used "set -v" in my hive query file to see the value of mapred.user.jobconf.limit,
and I discovered:
> a) hive -e "set mapred.user.jobconf.limit=7000000; set -v" | grep mapred.user.jobconf.limit
showed the value as 7000000, so it seems as if the CLI setting is being observed.
> b) hive -e "set -v" | grep mapred.user.jobconf.limit showed the value as 5242880, which
suggests that the safety valve isn't working (?).
> 3) Finally, I wondered if there was a hard-coded maximum in the code of 5MB for mapred.user.jobconf.limit,
despite looking at the code and seeing nothing obvious, so I tried "set mapred.user.jobconf.limit=100"
to set it to a very small value to see if the exception would show that I had exceeded the
limit, which should now be reported at '100'.  Guess what?  The query executed successfully,
which makes absolutely no sense to me.
> FYI, the size in bytes of mapred.input.dir for this query was 5392189.
> Does anyone have know why:
> 1) The safety valve setting wasn't observed,
> 2) The CLI setting, which seemed to observed was not used, at least according to the
limit stated by the exception, and
> 3) Why setting mapred.user.jobconf.limit to an absurdly low number actually allowed the
query to be successful?
> Thanks,
> Greg

View raw message