hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Map reduce classes
Date Wed, 16 Apr 2008 20:07:06 GMT

That design is fine.

You should read your map in the configure method of the reducer.

There is a MapFile format supported by Hadoop, but they tend to be pretty
slow.  I usually find it better to just load my hash table by hand.  If you
do this, you should use whatever format you like.


On 4/16/08 12:41 PM, "Aayush Garg" <aayush.garg@gmail.com> wrote:

> HI,
> 
> The current structure of my program is::
> Upper class{
> class Reduce{
>   reduce function(K1,V1,K2,V2){
>         // I count the frequency for each key
>      // Add output in  HashMap(Key,value)  instead  of  output.collect()
>    }
>  }
> 
> void run()
>  {
>       runjob();
>      // Now eliminate top frequency keys in HashMap built in reduce function
> here because only now hashmap is complete.
>      // Write this hashmap to a file in such a format so that I can use this
> hashmap in next MapReduce job and key of this hashmap is taken as key in
> mapper function of that Map Reduce. ?? How and which format should I
> choose??? Is this design and approach ok?
> 
>   }
> 
>   public static void main() {}
> }
> I hope you have got my question.
> 
> Thanks,
> 
> 
> On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat <amarrk@yahoo-inc.com> wrote:
> 
>> Aayush Garg wrote:
>> 
>>> Hi,
>>> 
>>> Are you sure that another MR is required for eliminating some rows?
>>> Can't I
>>> just somehow eliminate from main() when I know the keys which are needed
>>> to
>>> remove?
>>> 
>>> 
>>> 
>> Can you provide some more details on how exactly are you filtering?
>> Amar
>> 
>> 
>> 


Mime
View raw message