hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Fuller <>
Subject mapred.user.jobconf.limit question
Date Wed, 20 Jun 2012 16:52:24 GMT

I sent this out to yesterday, but the hive mailing list might
be a better forum for this question.

With CDH3u4 and Cloudera Manager, and I am running a hive query to repartition all of our
tables.  I'm reducing the number of partitions from 5 to 2, because the performance benefits
of a smaller mapred.input.dir is significant, which I only realized as our tables have grown
in size, and there was little perceived benefit from having the extra partitions considering
our typical queries.  After adjusting hive.exec.max.dynamic.partitions to deal with the enormous
number of partitions in our larger tables, I got this exception when running the conversion

org.apache.hadoop.ipc.RemoteException: Exceeded
max jobconf size: 5445900 limit: 5242880
       at org.apache.hadoop.mapred.JobTracker.submitJob( 
       at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(
       at java.lang.reflect.Method.invoke(

I solved this problem in a round-about way such that the query was successful, but I don't
understand why and would like to get a better grasp on what is going on.   I tried these things:

1) In my hive query file, I added "set mapred.user.jobconf.limit=7000000;" before the query,
but I saw the exact same exception.

2) Since setting mapred.user.jobconf.limit from the CLI didn't seem to be working, I used
the safety valve for the jobtracker via cloudera manager to add this:


and then I saved those changes, restarted the job tracker, and reran the query.  I saw the
same exception.

Digging further, I used "set -v" in my hive query file to see the value of mapred.user.jobconf.limit,
and I discovered:

a) hive -e "set mapred.user.jobconf.limit=7000000; set -v" | grep mapred.user.jobconf.limit
showed the value as 7000000, so it seems as if the CLI setting is being observed.
b) hive -e "set -v" | grep mapred.user.jobconf.limit showed the value as 5242880, which suggests
that the safety valve isn't working (?).

3) Finally, I wondered if there was a hard-coded maximum in the code of 5MB for mapred.user.jobconf.limit,
despite looking at the code and seeing nothing obvious, so I tried "set mapred.user.jobconf.limit=100"
to set it to a very small value to see if the exception would show that I had exceeded the
limit, which should now be reported at '100'.  Guess what?  The query executed successfully,
which makes absolutely no sense to me.

FYI, the size in bytes of mapred.input.dir for this query was 5392189.

Does anyone have know why:

1) The safety valve setting wasn't observed,
2) The CLI setting, which seemed to observed was not used, at least according to the limit
stated by the exception, and
3) Why setting mapred.user.jobconf.limit to an absurdly low number actually allowed the query
to be successful?

View raw message