pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-2440) AvroStorage won't store :(
Date Wed, 21 Dec 2011 02:03:30 GMT

     [ https://issues.apache.org/jira/browse/PIG-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Russell Jurney updated PIG-2440:
--------------------------------

    Component/s: piggybank
        Summary: AvroStorage won't store :(  (was: AvroStorage relations stop working after
using DUMP)
    
> AvroStorage won't store :(
> --------------------------
>
>                 Key: PIG-2440
>                 URL: https://issues.apache.org/jira/browse/PIG-2440
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.9.1, 0.10, 0.9.2
>         Environment: Mac OS X, running pig trunk
>            Reporter: Russell Jurney
>              Labels: avro, happy, pants, pig, sad, storage, storefunc, udf
>             Fix For: 0.9.1, 0.10, 0.9.2
>
>
> I am creating Avro records according to the instructions/code at https://github.com/rjurney/Collecting-Data
 They look like this:
>     {
>         "namespace": "agile.data.avro",
>         "name": "Email",
>         "type": "record",
>         "fields": [
>             {"name":"message_id", "type": ["string", "null"]},
>             {"name":"from","type": ["string", "null"]},
>             {"name":"to","type": [{"type":"array", "items":"string"}, "null"]},
>             {"name":"cc","type": [{"type":"array", "items":"string"}, "null"]},
>             {"name":"bcc","type": [{"type":"array", "items":"string"}, "null"]},
>             {"name":"reply_to", "type": [{"type":"array", "items":"string"}, "null"]},
>             {"name":"subject", "type": ["string", "null"]},
>             {"name":"body", "type": ["string", "null"]},
>             {"name":"date", "type": ["string", "null"]}
>         ]
>     }
> I have applied the patch at PIG-2411 to get Pig to store bags in Avro arrays.  I am running
pig in local mode via: pig -l /tmp -x local -v
> The script is:
> REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
> REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
> messages = LOAD '/tmp/10000_emails.avro' USING AvroStorage();
> smaller = FOREACH messages GENERATE from, to;
> pairs = FOREACH smaller GENERATE from, FLATTEN(smaller.to) AS to;
> STORE pairs INTO '/tmp/mail_pairs.avro' USING AvroStorage();
> 2011-12-20 17:58:25,705 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,719 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,722 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,737 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,740 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,751 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,755 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,757 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,760 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,762 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,766 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig
features used in the script: UNKNOWN
> 2011-12-20 17:58:25,804 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,808 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,810 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
> 2011-12-20 17:58:25,812 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 3
> 2011-12-20 17:58:25,813 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 map-only splittees.
> 2011-12-20 17:58:25,813 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 out of total 3 MR operators.
> 2011-12-20 17:58:25,813 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 2
> 2011-12-20 17:58:25,813 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,817 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,817 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig
script settings are added to the job
> 2011-12-20 17:58:25,818 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-12-20 17:58:25,822 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
> 2011-12-20 17:58:25,826 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-12-20 17:58:25,826 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
> 2011-12-20 17:58:25,930 [Thread-22] WARN  org.apache.hadoop.mapred.JobClient - No job
jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> 2011-12-20 17:58:26,327 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
> 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected
error when launching map reduce job.
> 2011-12-20 17:58:26,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException:
ERROR 1002: Unable to store alias pairs
> 	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1553)
> 	at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
> 	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:943)
> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> 	at org.apache.pig.Main.run(Main.java:523)
> 	at org.apache.pig.Main.main(Main.java:148)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected
error when launching map reduce job.
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:311)
> 	at org.apache.pig.PigServer.launchPlan(PigServer.java:1271)
> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1256)
> 	at org.apache.pig.PigServer.execute(PigServer.java:1246)
> 	at org.apache.pig.PigServer.access$400(PigServer.java:127)
> 	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1548)
> 	... 13 more
> Caused by: java.lang.NullPointerException
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:193)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187)
> 	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> 	at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> 	at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> 	at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message