hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cristian Ivascu (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-511) DIFF does not work in types branch
Date Thu, 30 Oct 2008 13:43:44 GMT
DIFF does not work in types branch
----------------------------------

                 Key: PIG-511
                 URL: https://issues.apache.org/jira/browse/PIG-511
             Project: Pig
          Issue Type: Bug
          Components: data
    Affects Versions: types_branch
         Environment: CentOS 5, hadoop 0.18.0, pig built from types branch
            Reporter: Cristian Ivascu


using DIFF(bag1, bag2) always returns an empty bag

Reason: in the compute_diff, the input bags are discarded, and the actual operations are done
against two newly created, empty bags

fix: make sure the compute_diff(bag1, bag2, output) does its work on bag 1 and bag2, instead
of d1 and d2.

Currently:
       DataBag d1 = mBagFactory.newDistinctBag();
        DataBag d2 = mBagFactory.newDistinctBag();
        Iterator<Tuple> i1 = d1.iterator();
        Iterator<Tuple> i2 = d2.iterator();
        while (i1.hasNext()) d1.add(i1.next());
        while (i2.hasNext()) d2.add(i2.next());

Should be:
       DataBag d1 = mBagFactory.newDistinctBag();
        DataBag d2 = mBagFactory.newDistinctBag();
        Iterator<Tuple> i1 = bag1.iterator();
        Iterator<Tuple> i2 = bag2.iterator();
        while (i1.hasNext()) d1.add(i1.next());
        while (i2.hasNext()) d2.add(i2.next());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message