pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3312) Pig duplicates avro records
Date Thu, 09 May 2013 00:37:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652591#comment-13652591
] 

Viraj Bhat commented on PIG-3312:
---------------------------------

Hi Hans,
 Could you try upgrading only the piggybank.jar, which contains the AvroStorage related classes
from Pig 0.8.1 to Pig 0.10.1. I did not see this problem in Pig 0.10.1 and beyond.

user_data= LOAD 'twitter_files/twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage();
describe user_data;
dump user_data;

Results in:
(miguno,Rock: Nerf paper, scissors is fine.,1366150681)
(BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
(Test1,One Tweet,1366154490)

You however cannot read the twitter.json using AvroStorage.

Caused by: java.io.IOException: Not a data file.
        at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
        at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:218)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:169)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:145)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:293)
        at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
        ... 18 more

Viraj
                
> Pig duplicates avro records
> ---------------------------
>
>                 Key: PIG-3312
>                 URL: https://issues.apache.org/jira/browse/PIG-3312
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.1
>            Reporter: Hans Uhlig
>         Attachments: twitter.avro, twitter.avsc, twitter.json
>
>
> Pig will report avro records twice.
> To Reproduce:
> * Place attached files on hdfs
> * run pig
> > register lib/piggybank.jar
> > register lib/avro-1.7.4.jar
> > register lib/json-simple-1.1.jar
> > register lib/jackson-mapper-asl-1.6.0.jar
> > register lib/jackson-core-asl-1.6.0.jar
> > user_data= LOAD 'twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage();
> > dump user_data;
> Result: 
> (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
> (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
> (Test1,One Tweet,1366154490)
> (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
> (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
> (Test1,One Tweet,1366154490)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message