hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <>
Subject Re: map tasks are taking ever when running job on 24 TB
Date Thu, 25 Apr 2013 22:10:57 GMT
That’s a lot of partitions for one Hive Job ! Not sure if that itself is the root of the
issues….There have been quite a few discussions on max 1000-ish number of partitions as
Is your use case conducive too using Combiners (though they cannot be guaranteed to be called)

From: Srinivas Surasani <<>>
Reply-To: "<>" <<>>
Date: Thursday, April 25, 2013 2:33 PM
To: "<>" <<>>
Subject: map tasks are taking ever when running job on 24 TB


I'm running hive job on 24TB dataset (on 34560 partitions ). here about 500 to 1000 mappers
are getting succeded (total of 80000) and rest mappaers are taking for ever ( their status
stays at 0% all times ).  Is there any limitations on number of partitions/dataset ? are there
any paraemeters to set  here?

Same job  is suceeding on 18TB (25920 partitions ).

I already set below in my hive query.
set mapreduce.jobtracker.split.metainfo.maxsize=-1;


This email message and any attachments are for the exclusive use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please contact the sender
by reply email and destroy all copies of the original message along with any attachments,
from your computer system. If you are the intended recipient, please be advised that the content
of this message is subject to access, review and disclosure by the sender's Email System Administrator.

View raw message