pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Warrington" <awarr...@gmail.com>
Subject Re: Review Request: PIG-1702. Fix for task output logs for streaming jobs containing null input-split information.
Date Tue, 31 May 2011 23:10:39 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/547/#review383
-----------------------------------------------------------



trunk/src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java
<https://reviews.apache.org/r/547/#comment725>

    Referencing PigMapReduce.sJobContext may cause a race condition in local Pig jobs, similar
to what is described in PIG-1831. Should a similar fix be applied where the context in PigMapReduce
is in thread local storage?


- Adam


On 2011-05-19 16:27:22, Adam Warrington wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/547/
> -----------------------------------------------------------
> 
> (Updated 2011-05-19 16:27:22)
> 
> 
> Review request for pig.
> 
> 
> Summary
> -------
> 
> This is a patch for PIG-1702, which describes an issue where the task output logs for
PIG streaming jobs contains null input-split information. The ability to query the input-split
information through the JobConf went away with the new MR API. We must now gain a reference
to the underlying FiletSplit, and query this reference for that information.
> 
> 
> Diffs
> -----
> 
>   trunk/src/org/apache/pig/backend/hadoop/streaming/HadoopExecutableManager.java 1088692

> 
> Diff: https://reviews.apache.org/r/547/diff
> 
> 
> Testing
> -------
> 
> To test this, I wrote a very simple python script to pass data through using PIG. After
checking the task logs of the completed task, the stderr logs now contain valid input split
information. Below are the scripts and test data used.
> 
> ### PIG commands run ###
> DEFINE testpy `test.py` SHIP ('test.py');
> raw_records = LOAD '/test.txt2'; 
> T1 = STREAM raw_records THROUGH testpy;
> dump T1;
> 
> ### test.py ###
> #!/usr/bin/python
> import sys
> 
> cnt = 0
> for line in sys.stdin:
>     print line.strip() + " " + str(cnt)
>     cnt += 1
> 
> ### contents of /test.txt on hdfs ###
> one line
> two line
> three line
> four line
> 
> 
> Thanks,
> 
> Adam
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message