avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <scottca...@apache.org>
Subject Re: Pig duplicate records
Date Wed, 21 Sep 2011 20:55:11 GMT
You will want to ask the pig user mailing list this question.

org.apache.pig.piggybank.storage.avro.AvroStorage is maintained by the Pig
project and you will get more help from there.

On 9/21/11 4:34 AM, "Alex Holmes" <grep.alex@gmail.com> wrote:

>Hi all,
>I have a simple schema
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name", "type": "string"},
>    {"name": "id", "type": "int"}
>  ]
>which I use to write 2 records to an Avro file, and my reader code
>(which reads the file and dumps the records) verifies that there are 2
>records in the file:
>When using this file with pig and AvroStorage, pig seems to think
>there are 4 records:
>grunt> REGISTER /app/hadoop/lib/avro-1.5.4.jar;
>grunt> REGISTER /app/pig-0.9.0/contrib/piggybank/java/piggybank.jar;
>grunt> REGISTER /app/pig-0.9.0/build/ivy/lib/Pig/json-simple-1.1.jar;
>grunt> REGISTER 
>grunt> REGISTER 
>grunt> raw = LOAD 'test.v1.avro' USING
>grunt> dump raw;
>Successfully read 4 records (825 bytes) from:
>Successfully stored 4 records (46 bytes) in:
>Total records written : 4
>Total bytes written : 46
>I'm sure I'm doing something wrong (again)!
>Many thanks,

View raw message