hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pere Ferrera <ferrerabert...@gmail.com>
Subject Re: Problem with using BinSedesTuple as Mapper key
Date Thu, 03 May 2012 09:29:25 GMT
Hi Gayatri,
Looks like you might want to use a low-level enhancement of the default
Hadoop API called Pangool (http://pangool.net) which uses tuples and
simplifies grouping by, sorting by and joining datasets in Hadoop.

On Mon, Apr 23, 2012 at 7:30 AM, Gayatri Rao <rgayatri1@gmail.com> wrote:

> Hello,
>
> I am using BinSedesTuple as a mapper key to emit a tuple of values. But
> somehow same keys do not go to the same reducer and I do not get
> aggregates.
> Is it not suggested to use it as a mapper key?
>
> For example in my mapper I emit
>
> Mapper:
> Output key : BinSedesTuple  value: int
>
>
> Example output:
> tuple.append(url);
> tuple.append(category);
>
> Reducer:
> Input key: BinSedesTuple value: int
> Output key: Text value: int
>
> Example output:
> url1 category1 3
> url1 category1 2
>
> In the reducer output I get output with multiple keys being the same. My
> expected output is
> url1 category 5
>
> Any ideas what might be wrong?
>
>
> Thanks,
> Gayatri
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message