hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext
Date Tue, 04 Aug 2009 17:19:14 GMT

     [ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Pradeep Kamath updated PIG-901:

    Status: Patch Available  (was: Open)

PIG-901-trunk.patch is for the trunk. The change is in SliceWrapper to serialize ExecType
only instead of PigContext since only the ExecType from the PigContext is used on deserialization.
The package import list which Daniel referred to is a static member of PigContext which is
explicitly set in SliceWrapper.makeRecordReader() and hence is taken care of.

It is a good suggestion to include a test case to check that even with a sizeable PigContext,
we actually create small input splits. However to do this in the current Pig code layout means
opening up PigServer and JobControlCompiler so that we can compile a pig script upto job creation
and then instead of submitting the job to hadoop, instatiate PigInputFormat with the jobConf
and get the Input Splits. This may require some design changes which we should address at
some point for these kinds of tests. For now there is regression test in the patch to ensure
the package import list is correctly handled and we have manually tested to ensure the split
size is small (order of KBs).

> InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext
> ------------------------------------------------------------------------------------
>                 Key: PIG-901
>                 URL: https://issues.apache.org/jira/browse/PIG-901
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.3.1
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.4.0
>         Attachments: PIG-901-1.patch, PIG-901-branch-0.3.patch, PIG-901-trunk.patch
> InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext.
SliceWrapper only needs ExecType - so the entire PigContext should not be serialized and only
the ExecType should be serialized.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message