pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5338) Prevent deep copy of DataBag into Jython List
Date Thu, 03 May 2018 22:54:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463168#comment-16463168

Rohini Palaniswamy commented on PIG-5338:

A really good idea. Patch looks good. Couple of comments

1) Can you rename variables getIndex and getIterator to currIndex and iterator.
2)  In list___contains__(PyObject o) method,  JythonUtils.pythonToPig(o) can be done once
outside the loop.
3) getIterator.hasNext() is redundant in get() as we already check size() before. If the size
and index calculation actually does not match and hasNext is false, then something is not
right and better to get an error in that case.

1)  If JythonBag does not have the method, you are performing a deep copy. But shouldn't the
invocation be on the PyList after that? Why call methods on the PyList only the next time?
How does it work.
48	                pyList = ((JythonBag)methodInvocation.getThis()).toPyList();
49	            }
50	            return methodInvocation.proceed();

1)  Can you use assertEquals instead of assertTrue and also hardcode value of wordCount instead
of computing it every time.
Assert.assertTrue(((DataBag) t.get(0)).size() == wordCount);

> Prevent deep copy of DataBag into Jython List
> ---------------------------------------------
>                 Key: PIG-5338
>                 URL: https://issues.apache.org/jira/browse/PIG-5338
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Greg Phillips
>            Assignee: Greg Phillips
>            Priority: Major
>         Attachments: PIG-5338.001.patch, PIG-5338.patch
> Pig Python UDFs currently perform deep copies on Bags converting them into Jython PyLists.
This can cause Jython UDFs to run out of memory and fail. A Jython DataBag which extends PyList could
allow for iterative access to DataBag elements, while only performing a deep copy when necessary.

This message was sent by Atlassian JIRA

View raw message