pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aniket Mokashi" <aniket...@gmail.com>
Subject Re: Review Request 16928: PIG-3463 Pig should use hadoop local mode for small jobs
Date Tue, 21 Jan 2014 07:37:52 GMT


> On Jan. 21, 2014, 5:05 a.m., Daniel Dai wrote:
> > Looks good. We also need to add the configuration to conf/pig.properties comments
(#pig.auto.local.enabled=true, #pig.auto.local.input.maxbytes=100000000), so user know this
configuration.
> > 
> > This also reminds me we should read/write hdfs files in local mode, but that's a
different issue.
> 
> Aniket Mokashi wrote:
>     Thanks for the review, Daniel and Cheolsoo. I will add the properties to pig.properties
and commit tomorrow morning.
> 
> Cheolsoo Park wrote:
>     Aniket, can we run unit tests before committing? It's not a small patch, so I'd suggest
running unit tests. I can run it if that's not convenient for you. Give me one day.

Sounds good. I will run the tests on my side too. Please take your time, I will wait for your
+1.


- Aniket


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16928/#review32339
-----------------------------------------------------------


On Jan. 21, 2014, 2:52 a.m., Aniket Mokashi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16928/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 2:52 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien Le Dem.
> 
> 
> Bugs: PIG-3463
>     https://issues.apache.org/jira/browse/PIG-3463
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> If pig.auto.local.enabled is set, JCC will modify Configuration of all the jobs with
one reducer and input size less than pig.auto.local.input.maxbytes, so that they are forced
to run in local mode. Output of local run is also written to hdfs.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 
>   trunk/src/org/apache/pig/PigConfiguration.java 1558572 
>   trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java 1558572

>   trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1558572

>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java
1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java
1558572 
>   trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java
1558572 
>   trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 
>   trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16928/diff/
> 
> 
> Testing
> -------
> 
> Tried few scenarios with the patch-
> Load small data, group all, count - works in local mode.
> Load small data, another small data and replicated join - works in local mode.
> Load small data and order by key - all 3 jobs work in local mode and .
> Load small data and large data for replicated join - first job runs in local mode, second
runs in MR mode.
> Load large data and order by key - works in first stages in local mode and last stage
in MR mode.
> 
> 
> Thanks,
> 
> Aniket Mokashi
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message