hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianxin Wang <wangjx...@gmail.com>
Subject Re: how to get all different values for each key
Date Wed, 03 Aug 2011 06:22:51 GMT
hi,harsh
    After map, I can get all values for one key, but I want dedup these
values, only get all unique values. now I just do it like the image.

    I think the following code is not efficient.(using a HashSet to dedup)
Thanks:)

private static class MyReducer extends
Reducer<LongWritable,LongWritable,LongWritable,LongsWritable>
{
HashSet<Long> uids=new HashSet<Long>();
LongsWritable unique_uids=new LongsWritable();
public void reduce(LongWritable key,Iterable<LongWritable> values,Context
context)throws IOException,InterruptedException
{
uids.clear();
for(LongWritable v:values)
{
uids.add(v.get());
}
int size=uids.size();
long[] l=new long[size];
int i=0;
for(long uid:uids)
{
l[i]=uid;
i++;
}
unique_uids.Set(l);
context.write(key,unique_uids);
}
}


2011/8/3 Harsh J <harsh@cloudera.com>

> Use MapReduce :)
>
> If map output: (key, value)
> Then reduce input becomes: (key, [iterator of values across all maps
> with (key, value)])
>
> I believe this is very similar to the wordcount example, but minus the
> summing. For a given key, you get all the values that carry that key
> in the reducer. Have you tried to run a simple program to achieve this
> before asking? Or is something specifically not working?
>
> On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang <wangjx798@gmail.com> wrote:
> > HI,
> >    I hava many <key,value> pairs now, and want to get all different
> values
> > for each key, which way is efficient for this work.
> >
> >   such as input : <1,2> <1,3> <1,4> <1,3> <2,1> <2,2>
> >   output: <1,2/3/4> <2,1/2>
> >
> >   Thanks!
> >
> > walter
> >
>
>
>
> --
> Harsh J
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message