pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4697) Serialize relevant part of the udfcontext per vertex to reduce payload size
Date Fri, 16 Oct 2015 21:16:05 GMT

     [ https://issues.apache.org/jira/browse/PIG-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rohini Palaniswamy updated PIG-4697:
    Attachment: PIG-4697-2.patch

Rebased patch after PIG-4703 with some additional changes
   - Not serializing pigContext and tez.plan if not PigGraceShuffleVertexManager
   - Resetting UDFContext in input and output formats to force serialization again and protect
against danger of thread reuse.

> Serialize relevant part of the udfcontext per vertex to reduce payload size
> ---------------------------------------------------------------------------
>                 Key: PIG-4697
>                 URL: https://issues.apache.org/jira/browse/PIG-4697
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>         Attachments: PIG-4697-1.patch, PIG-4697-2.patch
>   What HCatLoader/HCatStorer puts in UDFContext is huge and if there are multiple of
them in the pig script, the size of data sent to Tez AM is huge and also the size of data
that Tez AM sends to tasks is huge causing RPC limit exceeded and OOM issues respectively.
 If Pig serializes only part of the udfcontext that is required for each vertex, it will save
a lot.  HCat folks are also looking up at cleaning what goes into the conf (it ends up serializing
whole job conf, not just hive-site.xml) and moving out the common part to be shared by all
hcat loaders and stores. 
> Also looking at other options for faster and compact serialization. Will create separate
jiras for that. Will use PIG-4653 to cleanup all other pig config other than udfcontext.

This message was sent by Atlassian JIRA

View raw message