pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Charles (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2368) Penny doesn't store the output file correctly
Date Tue, 15 Nov 2011 18:20:51 GMT
Penny doesn't store the output file correctly
---------------------------------------------

                 Key: PIG-2368
                 URL: https://issues.apache.org/jira/browse/PIG-2368
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.9.1
         Environment: Testing in a cluster with 6 nodes
Debian linux 64bit, 
java 1.6, 
Hadoop 0.20.2
            Reporter: Daniel Charles


When executed with penny, a pig script either has no ouptut or fails to filter data when the
files are in HDFS.
For reference:
copy a file with the content below to /home/hadoop/tablea.
0;4;2
1;3;3
2;2;0
3;1;4
4;0;1

script.pig content:
a = load '/home/hadoop/tablea' using PigStorage(';');
b = filter a by $2 < 1000;
store b into '/home/hadoop/tablea.out';

PENNY:
Command Line:
java -cp /var/tmp/hadoop-0.20.2/conf:/var/tmp/pig-0.9.1/pig.jar:/var/tmp/pig-0.9.1/contrib/penny/java/penny.jar
 org.apache.pig.penny.apps.ri.Main script.pig b 2
Output summary:

Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 0 records in: "/home/hadoop/tablea.out"

Counters:
Total records written : 0 [OUTPUT FILE IS EMPTY]
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
11/11/15 18:57:54 INFO mapReduceLayer.MapReduceLauncher: Success!
----------------------

Using the same environment and running pig without penny 
Command Line:
pig script.pig
Output summary:
Input(s):
Successfully read 5 records (30 bytes) from: "/home/hadoop/tablea"

Output(s):
Successfully stored 5 records (30 bytes) in: "/home/hadoop/tablea.out"

Counters:
Total records written : 5
Total bytes written : 30
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
================

A similar script for the passwd file makes penny incorrectly filter the output data
a = load '/home/hadoop/passwd' using PigStorage(':');
b = filter a by $3 >200;
store b into '/home/hadoop/passwdout';

Penny
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 68 records (3513 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 68 [OUTPUT FILE WASN'T FILTERED CORRECTLY]
Total bytes written : 3513
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

---------------
Pig
Input(s):
Successfully read 68 records (3513 bytes) from: "/home/hadoop/passwd"

Output(s):
Successfully stored 46 records (2555 bytes) in: "/home/hadoop/passwdout"

Counters:
Total records written : 46
Total bytes written : 2555
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

similar issue can be seen for the -nop- application too.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message