pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ido Hadanny (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-3411) pig skewed join with a big table causes “Split metadata size exceeded 10000000”
Date Tue, 06 Aug 2013 06:32:48 GMT
Ido Hadanny created PIG-3411:

             Summary: pig skewed join with a big table causes “Split metadata size exceeded
                 Key: PIG-3411
                 URL: https://issues.apache.org/jira/browse/PIG-3411
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.10.0
         Environment: Pig version 0.10.0-cdh3u4a
Hadoop 0.20.2-cdh3u4a
            Reporter: Ido Hadanny

We have a pig join between a small (16M rows) distinct table and a big (6B rows) skewed table.
A regular join finishes in 2 hours (after some tweaking). We tried using skewed and been able
to improve the performance to 20 minutes.

HOWEVER, when we try a bigger skewed table (19B rows), we get this message from the SAMPLER

Split metadata size exceeded 10000000. Aborting job job_201305151351_21573 [ScriptRunner]
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)

This is reproducible every time we try using skewed, and does not happen when we use the regular

we tried setting mapreduce.jobtracker.split.metainfo.maxsize=-1 and we can see it's there
in the job.xml file, but it doesn't change anything!

What's happening here? Is this a bug with the distribution sample created by using skewed?
Why doesn't it help changing the param to -1?

also available at http://stackoverflow.com/q/17163112/574187


This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message