hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (PIG-232) Number of output rows in the log seems to be invalid
Date Wed, 07 May 2008 08:29:55 GMT

    [ https://issues.apache.org/jira/browse/PIG-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594825#action_12594825
] 

acmurthy edited comment on PIG-232 at 5/7/08 1:29 AM:
-----------------------------------------------------------

Olga, this is due to the fact that the stream/store optimization is kicking in and hence only
the 'binary tuples' are being reported... could you please try by switching off the optimization?

/pig/studenttab10k has 10,000 records. 

Now:
{noformat}
IP = load '/pig/studenttab10k';
OP = stream IP through `perl -ne 'print $_;'`; 
store OP into '/pig/out' using PigStorage(',');
{noformat}

correctly shows 10,000 as the no. of output-records while:

{noformat}
IP = load '/pig/studenttab10k';
OP = stream IP through `perl -ne 'print $_;'`; 
store OP into '/pig/out';
{noformat}

shows the no. of output-records as 4 due to the stream/store optimization.

Could you please re-check? Thanks!

      was (Author: acmurthy):
    Olga, this is due to the fact that the stream/store optimization is kicking in and hence
only the 'binary tuples' are being reported... could you please try by switching off the optimization?

/pig/studenttab10k has 10,000 records. 

Now:
{noformat}
define CMD `script.pl` ship('../pig/scripts/script.pl');
IP = load '/pig/studenttab10k';
OP = stream IP through CMD; 
store OP into '/pig/out' using PigStorage(',');
{noformat}

correctly shows 10,000 as the no. of output-records while:

{noformat}
define CMD `script.pl` ship('../pig/scripts/script.pl');
IP = load '/pig/studenttab10k';
OP = stream IP through CMD; 
store OP into '/pig/out';
{noformat}

shows the no. of output-records as 4 due to the stream/store optimization.

Could you please re-check? Thanks!
  
> Number of output rows in the log seems to be invalid 
> -----------------------------------------------------
>
>                 Key: PIG-232
>                 URL: https://issues.apache.org/jira/browse/PIG-232
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Arun C Murthy
>
> My pig script:
> define CMD `perl PigStreamingBad.pl end` ship('PigStreamingBad.pl') stderr('CMD' limit
1);
> A = load 'studenttab10k';
> B = stream A through CMD;
> store B into 'out';
> My perl script:
> use strict;
> # This script is used to test streaming error cases in pig.
> # Usage: PigStreaming.pl <start|middle|end>
> # the parameter tells the application when to exit with error
> if ($#ARGV < 0)
> {
>         print STDERR "Usage PigStreaming.pl <start|middle|end>\n";
>         exit (-1);
> }
> my $pos = $ARGV[0];
> if ($pos eq "start")
> {
>         print STDERR "Failed in the beginning of the processing\n";
>         exit(1);
> }
> print STDERR "PigStreamingBad.pl: starting processing\n";
> my $cnt = 0;
> while (<STDIN>)
> {
>         print "$_";
>         $cnt++;
>         print STDERR "PigStreaming.pl: processing $_\n";
>         if (($cnt > 100) && ($pos eq "middle"))
>         {
>                 print STDERR "Failed in the middle of processing\n";
>                 exit(2);
>         }
> }
> print STDERR "Failed at the end of processing\n";
> exit(3);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message