pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2498) e2e tests failing in some cases due to incorrect unix sort args
Date Sat, 10 Nov 2012 21:43:12 GMT

    [ https://issues.apache.org/jira/browse/PIG-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494775#comment-13494775

Rohini Palaniswamy commented on PIG-2498:

 Hit failures because of this in RHEL 6 for some test cases (Order-6,7,8,9,18, Types-20,21,22,23,24,25,
Split-6, BigData-7,8). Came up with a patch by changing the failures to -k style, before I
came upon this jira. Patch looks good, but I have one comment. Since we are fixing all the
sort args, can we move off the obsolete origin-zero syntax and move to the -k style? I would
be glad to review, test and commit this one. Thanks.
> e2e tests failing in some cases due to incorrect unix sort args
> ---------------------------------------------------------------
>                 Key: PIG-2498
>                 URL: https://issues.apache.org/jira/browse/PIG-2498
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>         Attachments: PIG-2498.patch
> Some e2e tests are failing for me against 23 due to what I think are incorrect arguments
to unix sort. For example in Order_6:
> {noformat}
> 			'num' => 6,
> 			'pig' => q\a = load ':INPATH:/singlefile/studenttab10k';
> c = order a by $0;
> store c into ':OUTPATH:';\,
> 			'sortArgs' => ['-t', '	', '+0', '-1'],
> {noformat}
> The pig job is sorting by the first column, however unix sort is being told to sort by
the first and second columns.
> From the gnu sort manual (specifically pos2 is _inclusive_): http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html
> {noformat}
> '-k pos1[,pos2]'
> '--key=pos1[,pos2]'
> Specify a sort field that consists of the part of the line between pos1 and pos2 (or
the end of the line, if pos2 is omitted), inclusive.
> ...
> On older systems, sort supports an obsolete origin-zero syntax '+pos1 [-pos2]' for specifying
sort keys. The obsolete sequence 'sort +a.x -b.y' is equivalent to 'sort -k a+1.x+1,b' if
y is '0' or absent, otherwise it is equivalent to 'sort -k a+1.x+1,b+1.y'.
> {noformat}
> I verified this by running the sort manually with +0 -1 and +0 -0, in the first case
it fails, in the second case it passes.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message