pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Wilbur, AvroStorage
Date Wed, 07 Dec 2011 07:33:10 GMT
Yea please post to pig Jira, preferably with an example of how to reproduce the error (better
yet a test that demonstrates the fix)

On Dec 6, 2011, at 11:25 PM, Russell Jurney <russell.jurney@gmail.com> wrote:

> I fixed the bug, in AvroStorageUtils.java:
> 
>    /** check whether it is just a wrapped tuple */
>    public static boolean isTupleWrapper(ResourceFieldSchema pigSchema) {
>        System.err.println("is a wrapped tuple!");
>        Boolean status = false;
>        if(pigSchema.getType() == DataType.TUPLE)
>                if(pigSchema.getName() != null)
> 
> if(pigSchema.getName().equals(AvroStorageUtils.PIG_TUPLE_WRAPPER))
>                                status = true;
>        return status;
>    }
> 
> The script now works.  Will make a patch.  Should I make a ticket?
> 
> On Tue, Dec 6, 2011 at 5:36 PM, Dmitriy Ryaboy <dvryaboy@gmail.com> wrote:
> 
>> If you send a pull to wilbur, I can merge it. But we are also still
>> supporting piggybank as wilbur never really got off the ground...
>> 
>> D
>> 
>> On Tue, Dec 6, 2011 at 3:47 PM, Russell Jurney <russell.jurney@gmail.com>
>> wrote:
>>> I'm debugging the AvroStorage UDF in piggybank for this blog post:
>>> 
>> http://datasyndrome.com/post/13707537045/booting-the-analytics-application-events-ruby
>>> 
>>> The script is:
>>> 
>>> messages = LOAD '/tmp/messages.avro' USING AvroStorage();
>>> user_groups = GROUP messages by user_id;
>>> per_user = FOREACH user_groups {
>>>   sorted = ORDER messages BY message_id DESC;
>>>   GENERATE group AS user_id, sorted AS messages;
>>> }
>>> DESCRIBE per_user
>>>> per_user: {user_id: int,messages: {(message_id: int,topic:
>>> chararray,user_id: int)}}
>>> STORE per_user INTO '/tmp/per_user.avro' USING AvroStorage();
>>> 
>>> The error is:
>>> 
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 1002: Unable to store alias per_user
>>> 
>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to
>>> store alias per_user
>>> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1596)
>>> at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>>> at
>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>>> at
>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>> at
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>>> at
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
>>> at org.apache.pig.Main.run(Main.java:487)
>>> at org.apache.pig.Main.main(Main.java:108)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> at
>>> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>> Caused by: java.lang.NullPointerException
>>> at
>>> 
>> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.isTupleWrapper(AvroStorageUtils.java:327)
>>> at
>>> 
>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:82)
>>> at
>>> 
>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:105)
>>> at
>>> 
>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convertRecord(PigSchema2Avro.java:151)
>>> at
>>> 
>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:62)
>>> at
>>> 
>> org.apache.pig.piggybank.storage.avro.AvroStorage.checkSchema(AvroStorage.java:502)
>>> at
>>> 
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
>>> at
>> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
>>> at
>>> 
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>>> at
>>> 
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>> at
>>> 
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>> at
>>> 
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>> at
>>> 
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>> at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>>> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>> at
>>> 
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
>>> at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:292)
>>> at org.apache.pig.PigServer.compilePp(PigServer.java:1360)
>>> at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1297)
>>> at org.apache.pig.PigServer.execute(PigServer.java:1286)
>>> at org.apache.pig.PigServer.access$400(PigServer.java:125)
>>> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1591)
>>> ... 13 more
>>> 
>>> 
>>> I need to fix this.  Which means I need to commit a patch to get in the
>>> current piggybank?  I've got some time... is it worthwhile to resurrect
>>> wilbur on github and move piggybank over?
>>> 
>>> --
>>> Russell Jurney
>>> twitter.com/rjurney
>>> russell.jurney@gmail.com
>>> datasyndrome.com
>> 
> 
> 
> 
> -- 
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com

Mime
View raw message