hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@101tec.com>
Subject Re: [jira] Updated: (PIG-114) store one alias/logicalPlan twice leads to instantiation of StoreFunc as LoadFunc
Date Thu, 21 Feb 2008 10:11:44 GMT
Hi,
apaches jira is down. So let me answer this by mail so I dont forget  
until tomorrow. :) I will attach the text as reference to the issue as  
soon jira is available again.
 From a frist look into the code I guess this is what happens.
The data are stored successfully on disk the first time you call  
store. So PoStore adds an entry to materializedResults.
What is basically a hashmap that holds OperatorKey - just a name and  
LocalResult - a pointer to the file you just wrote.

If you now trigger store again for the same alias, pig tries to  
optimize performance bt reusing the output file you just stored.
This happens by first check if there is already materializedResults  
entry.
What is the case - so in theory this could be reused just read and  
writte again to a new path.
Now there are a couple of problems. First in your testcase you delete  
the output file (/tmp/testPigOutput) but pig tries to read in this  
file again to write it out again. What means you read and write at the  
same time into the same file. Another problem in your test you delete  
this file between the store calls , so it can't be read back.

Now a pig come in.  Pig tries to read back in this file with the same  
object you used for storing this file.
So the object need to implement LoadFunc und StoreFunc, what is not  
the case in your test you only implement storefunc, what makes sense  
from my pov. See POLoad, line 57,
  lf = (LoadFunc)  
PigContext.instantiateFuncFromSpec(fileSpec.getFuncSpec()); // the  
return value can be a StoreFunc only as well.

This worked so far since most of the StoreFunc and LoadFunc are  
implemented in one class, but not a good idea.

So now the question to the pig developers, how we can solve that  
problem?
Only cache materialized files in case we do have a load and a store  
func available?
Re process all required plans in case we can not load a materialized  
result?

Any thoughts?

Stefan





On Feb 20, 2008, at 3:59 PM, Johannes Zillmann (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/PIG-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

>  ]
>
> Johannes Zillmann updated PIG-114:
> ----------------------------------
>
>    Attachment: pigPatch-storeTwice-620665.patch
>
>> store one alias/logicalPlan twice leads to instantiation of  
>> StoreFunc as LoadFunc
>> ---------------------------------------------------------------------------------
>>
>>                Key: PIG-114
>>                URL: https://issues.apache.org/jira/browse/PIG-114
>>            Project: Pig
>>         Issue Type: Bug
>>         Components: impl
>>           Reporter: Johannes Zillmann
>>        Attachments: pigPatch-storeTwice-620665.patch
>>
>>
>> Calling PigServer#store() twice for an alias results in following  
>> exception :
>> {noformat}
>> java.lang.RuntimeException: java.lang.ClassCastException:  
>> org.apache.pig.test.DummyStoreFunc cannot be cast to  
>> org.apache.pig.LoadFunc
>> 	at  
>> org 
>> .apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java: 
>> 59)
>> 	at  
>> org 
>> .apache 
>> .pig 
>> .backend 
>> .local 
>> .executionengine 
>> .LocalExecutionEngine.doCompile(LocalExecutionEngine.java:167)
>> 	at  
>> org 
>> .apache 
>> .pig 
>> .backend 
>> .local 
>> .executionengine 
>> .LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184)
>> 	at  
>> org 
>> .apache 
>> .pig 
>> .backend 
>> .local 
>> .executionengine 
>> .LocalExecutionEngine.doCompile(LocalExecutionEngine.java:184)
>> 	at  
>> org 
>> .apache 
>> .pig 
>> .backend 
>> .local 
>> .executionengine 
>> .LocalExecutionEngine.compile(LocalExecutionEngine.java:111)
>> 	at  
>> org 
>> .apache 
>> .pig 
>> .backend 
>> .local 
>> .executionengine 
>> .LocalExecutionEngine.compile(LocalExecutionEngine.java:90)
>> 	at  
>> org 
>> .apache 
>> .pig 
>> .backend 
>> .local 
>> .executionengine 
>> .LocalExecutionEngine.compile(LocalExecutionEngine.java:1)
>> 	at org.apache.pig.PigServer.store(PigServer.java:330)
>> 	at org.apache.pig.PigServer.store(PigServer.java:317)
>> 	at org.apache.pig.test.StoreTwiceTest.testIt(StoreTwiceTest.java:31)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at  
>> sun 
>> .reflect 
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at  
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:589)
>> 	at junit.framework.TestCase.runTest(TestCase.java:164)
>> 	at junit.framework.TestCase.runBare(TestCase.java:130)
>> 	at junit.framework.TestResult$1.protect(TestResult.java:110)
>> 	at junit.framework.TestResult.runProtected(TestResult.java:128)
>> 	at junit.framework.TestResult.run(TestResult.java:113)
>> 	at junit.framework.TestCase.run(TestCase.java:120)
>> 	at junit.framework.TestSuite.runTest(TestSuite.java:228)
>> 	at junit.framework.TestSuite.run(TestSuite.java:223)
>> 	at  
>> org 
>> .junit 
>> .internal.runners.OldTestClassRunner.run(OldTestClassRunner.java:35)
>> 	at  
>> org 
>> .eclipse 
>> .jdt 
>> .internal 
>> .junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
>> 	at  
>> org 
>> .eclipse 
>> .jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>> 	at  
>> org 
>> .eclipse 
>> .jdt 
>> .internal 
>> .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
>> 	at  
>> org 
>> .eclipse 
>> .jdt 
>> .internal 
>> .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
>> 	at  
>> org 
>> .eclipse 
>> .jdt 
>> .internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java: 
>> 386)
>> 	at  
>> org 
>> .eclipse 
>> .jdt 
>> .internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 
>> 196)
>> Caused by: java.lang.ClassCastException:  
>> org.apache.pig.test.DummyStoreFunc cannot be cast to  
>> org.apache.pig.LoadFunc
>> 	at  
>> org 
>> .apache.pig.backend.local.executionengine.POLoad.<init>(POLoad.java: 
>> 57)
>> 	... 28 more
>> {noformat}
>> I will attach a patch with a test scenario for this. Basically the  
>> code is as follow:
>> {noformat}PigServer pig = new PigServer(ExecType.LOCAL);
>>        pig
>>                .registerQuery("A = LOAD 'test/org/apache/pig/test/ 
>> StoreTwiceTest.java' USING "
>>                        + DummyLoadFunc.class.getName() + "();");
>>        pig.registerQuery("B = FOREACH A GENERATE * ;");
>>        File outputFile = new File("/tmp/testPigOutput");
>>        outputFile.delete();
>>        pig.store("A", outputFile.getAbsolutePath(),  
>> DummyStoreFunc.class
>>                .getName()
>>                + "()");
>>        outputFile.delete();
>>        pig.store("B", outputFile.getAbsolutePath(),  
>> DummyStoreFunc.class
>>                .getName()
>>                + "()");
>>        outputFile.delete();
>>        assertEquals(2, _storedTuples.size());
>> {noformat}
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com



Mime
View raw message