hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-554) Fragment Replicate Join
Date Tue, 02 Dec 2008 12:10:44 GMT
Fragment Replicate Join

                 Key: PIG-554
                 URL: https://issues.apache.org/jira/browse/PIG-554
             Project: Pig
          Issue Type: New Feature
    Affects Versions: types_branch
            Reporter: Shravan Matthur Narayanamurthy
             Fix For: types_branch

Fragment Replicate Join(FRJ) is useful when we want a join between a huge table and a very
small table (fitting in memory small) and the join doesn't expand the data by much. The idea
is to distribute the processing of the huge files by fragmenting it and replicating the small
file to all machines receiving a fragment of the huge file. Because of the availability of
the entire small file, the join becomes a trivial task without needing any break in the pipeline.
Exhaustive test have done to determine the improvement we get out of FRJ. Will post the details
in a wiki and add a link here

The patch makes changes to parts of the code where new operators are introduced. Currently,
when a new operator is introduced, its alias is not set. For schema computation I have modified
this behaviour to set the alias of the new operator to that of its predecessor. The logical
side of the patch mimics the cogroup behavior as join syntax closely resembles that of cogroup.
Currently, this patch doesn't have support for joins other than inner joins. The rest of the
code has been documented.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message