hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arthur Zwiegincew (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-390) Union doesn't work
Date Sun, 31 Aug 2008 00:19:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627261#action_12627261
] 

Arthur Zwiegincew commented on PIG-390:
---------------------------------------

Here's a workaround I'm using:

package com.cooliris.analytics;

import java.io.IOException;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;

/**
 * Implements a UNIONALL Pig function. It accepts a tuple of the format <unused, {bag-1},
{bag-2}, {bag-3}, ...>
 * and outputs a set of tuples corresponding to UNION bag-1, bag-2, bag-3, ... . This is intended
as a workaround
 * to bug PIG-390 — Union doesn't work.
 * 
 * Instead of:
 *   combined = UNION data1, data2, data3;
 * ...do the following:
 *   cg_combined = COGROUP data1 BY 1, data2 BY 1, data3 BY 1;
 *   combined = FOREACH cg_combined GENERATE FLATTEN(com.cooliris.analytics.UNIONALL(*));
 * 
 * @author arthur@cooliris.com
 */
public class UNIONALL extends EvalFunc<DataBag> {

    @Override
    public void exec(Tuple input, DataBag output) throws IOException {
        for (int i = 1; i < input.arity(); ++i) {
            for (Tuple nested : input.getBagField(i)) {
                output.add(nested);
            }
        }
    }
}


> Union doesn't work
> ------------------
>
>                 Key: PIG-390
>                 URL: https://issues.apache.org/jira/browse/PIG-390
>             Project: Pig
>          Issue Type: Bug
>         Environment: Mac OS X
>            Reporter: Arthur Zwiegincew
>
> data files:
> $ cat ~/tmp/data
> 1	1
> 2	1
> 3	10
> $ cat ~/tmp/data-2
> 4	20
> 5	20
> pig script:
> data = load '/Users/arthur/tmp/data' as (x, y);
> data2 = load '/Users/arthur/tmp/data-2' as (x, y);
> both = union data, data2;
> dump both;
> result:
> (4, 20)
> (5, 20)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message