Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4686F9790 for ; Wed, 21 Sep 2011 23:42:38 +0000 (UTC) Received: (qmail 93597 invoked by uid 500); 21 Sep 2011 23:42:38 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 93545 invoked by uid 500); 21 Sep 2011 23:42:38 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 93536 invoked by uid 99); 21 Sep 2011 23:42:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 23:42:38 +0000 X-ASF-Spam-Status: No, hits=2.1 required=5.0 tests=FREEMAIL_FROM,HK_RANDOM_ENVFROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zjffdu@gmail.com designates 209.85.161.171 as permitted sender) Received: from [209.85.161.171] (HELO mail-gx0-f171.google.com) (209.85.161.171) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 23:42:31 +0000 Received: by gxk27 with SMTP id 27so1440135gxk.30 for ; Wed, 21 Sep 2011 16:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YQ33zdmYZLfN8j1n1rqtaWPlIMskmu209VBAVz9D6Tg=; b=jZhLJZXUV5lKvgcNlZ/bu8XXZG4J+wmb8hhmpfgjeURb2PhttopaRGJmf+wMUMeXV5 TBVqaKeZh7F5agq+1TyRdNM+VQi7qQ5brQl/5IQH4URRAyx39dLg/sU04n2fk2E2uPFa HgMtMfmuOSIWOmYsVRz/WmWk7aDJcoUGGobuA= MIME-Version: 1.0 Received: by 10.150.225.6 with SMTP id x6mr1646152ybg.57.1316648530941; Wed, 21 Sep 2011 16:42:10 -0700 (PDT) Received: by 10.150.11.12 with HTTP; Wed, 21 Sep 2011 16:42:10 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Sep 2011 16:42:10 -0700 Message-ID: Subject: Re: Pig duplicate records From: Jeff Zhang To: user@avro.apache.org, user@pig.apache.org Content-Type: multipart/alternative; boundary=000e0cd48324fb19ba04ad7c1fba --000e0cd48324fb19ba04ad7c1fba Content-Type: text/plain; charset=UTF-8 Seems this is a pig bug. Maybe it is caused by AvroStorage. According the log, it said pig read 4 records, and output 4 records. On Wed, Sep 21, 2011 at 1:55 PM, Scott Carey wrote: > You will want to ask the pig user mailing list this question. > > org.apache.pig.piggybank.storage.avro.AvroStorage is maintained by the Pig > project and you will get more help from there. > > On 9/21/11 4:34 AM, "Alex Holmes" wrote: > > >Hi all, > > > >I have a simple schema > > > >{"name": "Record", "type": "record", > > "fields": [ > > {"name": "name", "type": "string"}, > > {"name": "id", "type": "int"} > > ] > >} > > > >which I use to write 2 records to an Avro file, and my reader code > >(which reads the file and dumps the records) verifies that there are 2 > >records in the file: > > > >Record@1e9e5c73[name=r1,id=1] > >Record@ed42d08[name=r2,id=2] > > > >When using this file with pig and AvroStorage, pig seems to think > >there are 4 records: > > > >grunt> REGISTER /app/hadoop/lib/avro-1.5.4.jar; > >grunt> REGISTER /app/pig-0.9.0/contrib/piggybank/java/piggybank.jar; > >grunt> REGISTER /app/pig-0.9.0/build/ivy/lib/Pig/json-simple-1.1.jar; > >grunt> REGISTER > >/app/pig-0.9.0/build/ivy/lib/Pig/jackson-core-asl-1.6.0.jar; > >grunt> REGISTER > >/app/pig-0.9.0/build/ivy/lib/Pig/jackson-mapper-asl-1.6.0.jar; > >grunt> raw = LOAD 'test.v1.avro' USING > >org.apache.pig.piggybank.storage.avro.AvroStorage; > >grunt> dump raw; > >.. > >Input(s): > >Successfully read 4 records (825 bytes) from: > >"hdfs://localhost:9000/user/aholmes/test.v1.avro" > > > >Output(s): > >Successfully stored 4 records (46 bytes) in: > >"hdfs://localhost:9000/tmp/temp2039109003/tmp1924774585" > > > >Counters: > >Total records written : 4 > >Total bytes written : 46 > >.. > >(r1,1) > >(r2,2) > >(r1,1) > >(r2,2) > > > >I'm sure I'm doing something wrong (again)! > > > >Many thanks, > >Alex > > > -- Best Regards Jeff Zhang --000e0cd48324fb19ba04ad7c1fba Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Seems this is a pig bug. Maybe it is caused by AvroStorage.
According t= he log, it said pig read 4 records, and output 4 records.



On Wed, Sep 21, 2011 at 1:55 PM,= Scott Carey <scottcarey@apache.org> wrote:
You will want to ask the pig user mailing l= ist this question.

org.apache.pig.piggybank.storage.avro.AvroStorage is maintained by the Pig<= br> project and you will get more help from there.

On 9/21/11 4:34 AM, "Alex Holmes" <grep.alex@gmail.com> wrote:

>Hi all,
>
>I have a simple schema
>
>{"name": "Record", "type": "record&q= uot;,
> =C2=A0"fields": [
> =C2=A0 =C2=A0{"name": "name", "type": &q= uot;string"},
> =C2=A0 =C2=A0{"name": "id", "type": &quo= t;int"}
> =C2=A0]
>}
>
>which I use to write 2 records to an Avro file, and my reader code
>(which reads the file and dumps the records) verifies that there are 2<= br> >records in the file:
>
>Record@1e9e5c73[name=3Dr1,id=3D1]
>Record@ed42d08[name=3Dr2,id=3D2]
>
>When using this file with pig and AvroStorage, pig seems to think
>there are 4 records:
>
>grunt> REGISTER /app/hadoop/lib/avro-1.5.4.jar;
>grunt> REGISTER /app/pig-0.9.0/contrib/piggybank/java/piggybank.jar;=
>grunt> REGISTER /app/pig-0.9.0/build/ivy/lib/Pig/json-simple-1.1.jar= ;
>grunt> REGISTER
>/app/pig-0.9.0/build/ivy/lib/Pig/jackson-core-asl-1.6.0.jar;
>grunt> REGISTER
>/app/pig-0.9.0/build/ivy/lib/Pig/jackson-mapper-asl-1.6.0.jar;
>grunt> raw =3D LOAD 'test.v1.avro' USING
>org.apache.pig.piggybank.storage.avro.AvroStorage;
>grunt> dump raw;
>..
>Input(s):
>Successfully read 4 records (825 bytes) from:
>"hdfs://localhost:9000/user/aholmes/test.v1.avro"
>
>Output(s):
>Successfully stored 4 records (46 bytes) in:
>"hdfs://localhost:9000/tmp/temp2039109003/tmp1924774585"
>
>Counters:
>Total records written : 4
>Total bytes written : 46
>..
>(r1,1)
>(r2,2)
>(r1,1)
>(r2,2)
>
>I'm sure I'm doing something wrong (again)!
>
>Many thanks,
>Alex





--
= Best Regards

Jeff Zhang
--000e0cd48324fb19ba04ad7c1fba--