hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <>
Subject Re: Job Speed
Date Tue, 27 Jan 2009 16:14:07 GMT
I just realized this was a hive question, I have no experience with Hive, so
my advice is probably incorrect.

On Tue, Jan 27, 2009 at 8:13 AM, jason hadoop <>wrote:

> It is not clear to me fromyour email if you have the number of map tasks
> per machine set to > 1, or if you are attempting to us a multi-threaded
> mapper.
> How many tasks does the system split your job into? and how many execute at
> once.
> It is a first guess that you are getting 300 map tasks, and each runs for a
> small number of seconds, and most of that time is probably the task setup
> time.
> As a first try, you could try packing your 300 small files into as many
> files as you have simultaneous task execution slots and adjust the input
> split size (probably not necessary) to ensure there is no further splitting.
> The reduces all essentially stall until all of the map tasks are done, so
> the reduce copy speed is a misleading value.
> On Mon, Jan 26, 2009 at 11:27 PM, Josh Ferguson <>wrote:
>> So I have a table with roughly 145,000 records spread across 300 files.
>> The total size is about 7MB. Right now I'm running one job tracker and one
>> task tracker which is a high cpu amazon box (1.7 Gbits of RAM, ~ 4 cores). I
>> run the following query:
>> SELECT COUNT(DISTINCT(activities.actor_id)) FROM activities;
>> And it takes about 35 minutes to finish. One of my problems is that I
>> can't get my task tracker to process more than one map at a time even though
>> it has a higher number of maximum map tasks. But even that is relatively
>> fast compared to the reduce which takes about 30 minutes by itself. The
>> status of the task is:
>> reduce > copy (225 of 344 at 0.01 MB/s) >
>> I really don't understand what is going on during this copy step or why it
>> is taking so long. The files are small and they're all inside of amazon's
>> network. Can you guys help me out?
>> Josh F.

View raw message