pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5338) Prevent deep copy of DataBag into Jython List
Date Thu, 26 Apr 2018 15:55:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454432#comment-16454432
] 

Greg Phillips commented on PIG-5338:
------------------------------------

Thanks [~knoguchi]! I was able to run e2e successfully on a small cluster in a reasonable
amount of time (220 minutes). In addition to resolving the in the e2e error noted before I've
added testing, documentation, and the ability to return a native java DataBag from the Jython
UDF. I'm not certain returning a DataBag is the correct way to go, I may add more functionality
to the JythonBag to make it writable if that seems like a better way to proceed. 

> Prevent deep copy of DataBag into Jython List
> ---------------------------------------------
>
>                 Key: PIG-5338
>                 URL: https://issues.apache.org/jira/browse/PIG-5338
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Greg Phillips
>            Assignee: Greg Phillips
>            Priority: Major
>         Attachments: PIG-5338.001.patch, PIG-5338.patch
>
>
> Pig Python UDFs currently perform deep copies on Bags converting them into Jython PyLists.
This can cause Jython UDFs to run out of memory and fail. A Jython DataBag which extends PyList could
allow for iterative access to DataBag elements, while only performing a deep copy when necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message