pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-4420) Support for map side cross similar to replicate join
Date Fri, 13 Feb 2015 14:04:11 GMT
Rohini Palaniswamy created PIG-4420:

             Summary: Support for map side cross similar to replicate join
                 Key: PIG-4420
                 URL: https://issues.apache.org/jira/browse/PIG-4420
             Project: Pig
          Issue Type: New Feature
            Reporter: Rohini Palaniswamy

   Our CROSS implementation is very costly.  Recently had a case where a user was doing a
CROSS of 30million records against 3K records and it caused lot of disk error exceptions during
the shuffle phase. We need to add support for a map side cross syntax

C = CROSS A, B using 'replicate';

The smaller table can be loaded in a list (hashmap in replicate join) and iterated through
for each record in the bigger table. It should give a major performance boost and drastically
reduce the resource usage.

This message was sent by Atlassian JIRA

View raw message