hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Keyan (NSN - CN/Beijing)" <keyan....@nsn.com>
Subject RE: How to use the reduce result in next code part
Date Mon, 11 Jun 2012 03:35:42 GMT
Hi,

The reduce is to aggregate multiple rowkey_list to be one.
The complete rowkey_list is sorted by Reducer. In the next code part,
I would like to use the complete roweky_list. However context cannot be
used/passed.

How to collect their cloned copies in memory?

Thanks and regards,

-----Original Message-----
From: ext Harsh J [mailto:harsh@cloudera.com] 
Sent: Monday, June 11, 2012 11:22 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: How to use the reduce result in next code part

Hi,

Can your rowkey_list requiring logic not be implemented within a
single reduce(key, <List> values) call itself? If you require the
whole list before processing, and the whole lists may be small, then
collecting their cloned copies in memory is also one way out.

On Mon, Jun 11, 2012 at 8:39 AM, Liu, Keyan (NSN - CN/Beijing)
<keyan.liu@nsn.com> wrote:
> Hi All,
>
> I am using Mapreduce to scan HBase region to get the rowkey_list that
> related with one query.
>
> In Map period, each mapper outputs partial rowkey_list. In reduce
period,
> the reducer will collect and sort all rowkey.
>
> If I need to use rowkey_list result of the reduce, how can transport
the
> rowkey_list outside reduce?
>
> I have tried to write one reduce output to HDFS "/part-r-00000", then
read
> the result in HDFS, but I found the efficiency is too low.
>
> How can I use the reduce result in next code part? Is there one API or
> example that can be used?
>
> Thanks.
>
> Regards,
>
> William Liu



-- 
Harsh J

Mime
View raw message