pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Kommireddi <prash1...@gmail.com>
Subject Re: Bug when COUNTing bag of tuple ?
Date Fri, 09 Mar 2012 05:58:41 GMT
Hi Manish,

How is "chararrray" involved in case of a COUNT? Pig has 'long' type which
is the return type of the function COUNT.

-Prashant

On Thu, Mar 8, 2012 at 9:43 PM, Manish Bhoge <manishbhoge@rocketmail.com>wrote:

> Pig does its interpretation of datatype. I do suspect whether Chararray is
> the right choice for storing and counting an integer value.
> Thank You,
> Sent from my BlackBerry, pls excuse typo
>
> -----Original Message-----
> From: Dmitriy Ryaboy <dvryaboy@gmail.com>
> Date: Thu, 8 Mar 2012 09:07:21
> To: user@pig.apache.org<user@pig.apache.org>
> Reply-To: user@pig.apache.org
> Cc: user@pig.apache.org<user@pig.apache.org>
> Subject: Re: Bug when COUNTing bag of tuple ?
>
> You are supposed to use COUNT_STAR to count all rows. It's one of those
> "nulls are really strange beasts" things.
>
> On Mar 8, 2012, at 9:02 AM, Bill Graham <billgraham@gmail.com> wrote:
>
> > The issue here is that COUNT will increment a +1 for all tuples in the
> bag
> > where the item at the first position is not null.
> >
> > I've found this behavior to be strange as well though, so I'd like to
> hear
> > others take on why this is a feature and not a bug (if in fact that's the
> > case).
> >
> > On Thu, Mar 8, 2012 at 8:55 AM, Kevin Lion <klion@ubikod.com> wrote:
> >
> >> Hello,
> >>
> >> I think there is a bug in PIG when using COUNT on Bag of Tuple with
> empty
> >> element. Here is a minimal script to reproduce this bug :
> >>
> >> I've this CSV file :
> >> ,a
> >> 1,a
> >> 2,a
> >> ,a
> >> 3,b
> >> 4,b
> >> 5,b
> >>
> >> I use that script :
> >> test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS
> >> (key:chararray, value:chararray);
> >> test = GROUP test BY value;
> >> DUMP test;
> >> test = FOREACH test GENERATE group, COUNT(test);
> >> DUMP test;
> >>
> >> And the output is :
> >> (a,{(,a),(1,a),(2,a),(,a)})
> >> (b,{(3,b),(4,b),(5,b)})
> >> (a,2)
> >> (b,3)
> >>
> >> Does it seem to be normal ? I was expecting to :
> >> (a,{(,a),(1,a),(2,a),(,a)})
> >> (b,{(3,b),(4,b),(5,b)})
> >> (a,*4*)
> >> (b,3)
> >>
> >> Regards,
> >>
> >> Kevin Lion
> >> Capptain.com - Pilot your Apps
> >>
> >
> >
> >
> > --
> > *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> > billgraham@gmail.com going forward.*
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message