pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5338) Prevent deep copy of DataBag into Jython List
Date Sat, 21 Apr 2018 04:35:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446630#comment-16446630
] 

Koji Noguchi commented on PIG-5338:
-----------------------------------

Thanks Greg, Adam.

bq. although we'll also need to run (Scripting) e2e tests for verification.

Good idea.  Blindly running e2e with the patch, getting two failures. 
Scripting.Scripting_5 and Scripting.Scripting_9
Pasting the error message.
{noformat}
2018-04-20 18:38:51,316 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException:
ERROR 2997: Unable to recreate exception from backed error: Error: org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing (Name: c: New For Each(false,false,false)[bag] - scope-21
Operator Key: scope-21): org.apache.pig.backend.executionengine.ExecException: ERROR 2078:
Caught error from UDF: org.apache.pig.scripting.jython.JythonFunction [Error executing function]
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:260)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:280)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1949)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error
from UDF: org.apache.pig.scripting.jython.JythonFunction [Error executing function]
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:358)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:369)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:359)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
	... 12 more
Caused by: java.io.IOException: Error executing function
	at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:122)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
	... 17 more
Caused by: com.google.inject.ConfigurationException: Guice configuration errors:

1) Unable to method intercept: org.apache.pig.scripting.jython.JythonBag
  while locating org.apache.pig.scripting.jython.JythonBag

1 error
	at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1004)
	at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:961)
	at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
	at org.apache.pig.scripting.jython.JythonUtils.pigToPython(JythonUtils.java:133)
	at org.apache.pig.scripting.jython.JythonUtils.pigTupleToPyTuple(JythonUtils.java:153)
	at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:116)
	... 18 more
Caused by: java.lang.IllegalArgumentException: Cannot subclass final class class org.apache.pig.scripting.jython.JythonBag
	at com.google.inject.internal.cglib.proxy.$Enhancer.generateClass(Enhancer.java:446)
	at com.google.inject.internal.cglib.core.$DefaultGeneratorStrategy.generate(DefaultGeneratorStrategy.java:25)
	at com.google.inject.internal.cglib.core.$AbstractClassGenerator.create(AbstractClassGenerator.java:216)
	at com.google.inject.internal.cglib.proxy.$Enhancer.createHelper(Enhancer.java:377)
	at com.google.inject.internal.cglib.proxy.$Enhancer.createClass(Enhancer.java:317)
	at com.google.inject.internal.ProxyFactory$ProxyConstructor._init_(ProxyFactory.java:246)
	at com.google.inject.internal.ProxyFactory.create(ProxyFactory.java:172)
	at com.google.inject.internal.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:89)
	at com.google.inject.internal.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:28)
	at com.google.inject.internal.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:36)
	at com.google.inject.internal.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:32)
	at com.google.inject.internal.FailableCache$1.apply(FailableCache.java:39)
	at com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:549)
	at com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:419)
	at com.google.inject.internal.util.$CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
	at com.google.inject.internal.FailableCache.get(FailableCache.java:50)
	at com.google.inject.internal.ConstructorInjectorStore.get(ConstructorInjectorStore.java:49)
	at com.google.inject.internal.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:125)
	at com.google.inject.internal.InjectorImpl.initializeJitBinding(InjectorImpl.java:521)
	at com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:847)
	at com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:772)
	at com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:256)
	at com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:205)
	at com.google.inject.internal.InjectorImpl.getInternalFactory(InjectorImpl.java:853)
	at com.google.inject.internal.InjectorImpl.getProviderOrThrow(InjectorImpl.java:967)
	at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1000)
	... 23 more
{noformat}

> Prevent deep copy of DataBag into Jython List
> ---------------------------------------------
>
>                 Key: PIG-5338
>                 URL: https://issues.apache.org/jira/browse/PIG-5338
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Greg Phillips
>            Assignee: Greg Phillips
>            Priority: Major
>         Attachments: PIG-5338.patch
>
>
> Pig Python UDFs currently perform deep copies on Bags converting them into Jython PyLists.
This can cause Jython UDFs to run out of memory and fail. A Jython DataBag which extends PyList could
allow for iterative access to DataBag elements, while only performing a deep copy when necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message