pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1743) Skewed join sampler generates unevenly partitioned data
Date Mon, 13 Dec 2010 19:16:03 GMT

     [ https://issues.apache.org/jira/browse/PIG-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-1743:
--------------------------------

    Comment: was deleted

(was: Dear Sender,
 I am currently OOO from  December 6-10 without any access to my email.
Regards 
Viraj

)

> Skewed join sampler generates unevenly partitioned data
> -------------------------------------------------------
>
>                 Key: PIG-1743
>                 URL: https://issues.apache.org/jira/browse/PIG-1743
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>         Attachments: relation1.in, relation2.in
>
>
> I have a data, when using the Skewed join generated uneven partitions. The script looks
like this:
> {code}
> Data1 = LOAD '/user/viraj/relation1.in' AS (ref,intVal);
> Data2 = LOAD '/user/viraj/relation2.in' using PigStorage('\u0001') AS (ID:chararray,
Key:chararray, DomainKey:chararray);
> JoinData = JOIN Data1 BY ref LEFT OUTER , Data2 BY ID using 'skewed' PARALLEL 10;
> STORE JoinData into 'skewedoutput' using PigStorage('\u0001');
> {code}
> The output generated has the following part files of varying sizes
> {quote}
> $ hadoop fs -ls /user/viraj/skewedoutput
> Found 10 items
> -rw-------   3 viraj users       2090 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00000
> -rw-------   3 viraj users      19380 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00001
> -rw-------   3 viraj users       2090 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00002
> -rw-------   3 viraj users       9690 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00003
> -rw-------   3 viraj users       2090 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00004
> -rw-------   3 viraj users       2090 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00005
> -rw-------   3 viraj users          0 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00006
> -rw-------   3 viraj users          0 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00007
> -rw-------   3 viraj users          0 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00008
> -rw-------   3 viraj users          0 2010-11-23 03:44 /user/viraj/skewedoutput/part-r-00009
> {quote}
> Attaching input datasets.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message