hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lu, Wei" <...@microstrategy.com>
Subject re: Why Move Operations after MapReduce are in sequential?
Date Wed, 07 Mar 2012 17:01:20 GMT
Hi Bejoy.K.S,

  Yes, there are two steps and as for my query, there will be 6 steps with one mapreduce and
5 move operations. My question is why the 5 move operations are executed sequentially rather
than in parallel affter step 1?

Regards,
Wei
________________________________
发件人: Bejoy Ks [bejoy_ks@yahoo.com]
发送时间: 2012年3月7日 7:36
到: user@hive.apache.org
主题: Re: Why Move Operations after MapReduce are in sequential?

Hi Wei
     Here there are two operations that takes place for your query
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis1' select * where impressionid<'1239572996000'

1 - A map reduce job that performs the operation select * where impressionid<'1239572996000
2 -  A file system operation that copies the output of Step 1 from hdfs to lfs (hadoop fs
-copyToLocal). Step 2 would be executed only after completion of Step 1.


Regards
Bejoy.K.S

________________________________
From: "Lu, Wei" <wlu@microstrategy.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Sent: Wednesday, March 7, 2012 5:12 PM
Subject: Why Move Operations after MapReduce are in sequential?

Hi,

For the query below, I find the five Move Operations (after MapReduce job) are not operated
in parallel.

from impressions2
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis1' select * where impressionid<'1239572996000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis2' select * where impressionid<'1239592780000'
AND impressionid>='1239572996000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis3' select * where impressionid<'1239648597000'
AND impressionid>='1239592780000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis4' select * where impressionid<'1239714028000'
AND impressionid>='1239648597000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/iis5' select * where impressionid>='1239714028000';

------
Ended Job = job_201203060735_0008
Copying data to local directory /disk2/iis1
Copying data to local directory /disk2/iis1
Copying data to local directory /disk2/iis2
Copying data to local directory /disk2/iis2
Copying data to local directory /disk2/iis3
Copying data to local directory /disk2/iis3
Copying data to local directory /disk2/iis4
Copying data to local directory /disk2/iis4
Copying data to local directory /disk2/iis5
Copying data to local directory /disk2/iis5
------


I thought the Move Operations could be done in parallel, and the performance will be improved
is the MapReduce temp result is pretty large.


Regards,
Wei


Mime
View raw message