hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Can I share datas for several map tasks?
Date Tue, 16 Jun 2009 13:51:48 GMT
In the examples for my book is a jvm reuse with static data shared between
jvm's example

On Tue, Jun 16, 2009 at 1:08 AM, Hello World <snowloong@gmail.com> wrote:

> Thanks for your reply. Can you do me a favor to make a check?
> I modified mapred-default.xml as follows:
>    540 <property>
>    541   <name>mapred.job.reuse.jvm.num.tasks</name>
>    542   <value>-1</value>
>    543   <description>How many tasks to run per jvm. If set to -1, there is
>    544   no limit.
>    545   </description>
>    546 </property>
> And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop;
>
> This is my program:
>
>     17 public class WordCount {
>     18
>     19   public static class TokenizerMapper
>     20        extends Mapper<Object, Text, Text, IntWritable>{
>     21
>     22     private final static IntWritable one = new IntWritable(1);
>     23     private Text word = new Text();
>     24     public static int[] ToBeSharedData = new int[1024 * 1024 * 16];
>     25
>     26     protected void setup(Context context
>     27             ) throws IOException, InterruptedException {
>     28         //Init shared data
>     29         ToBeSharedData[0] = 12345;
>     30         System.out.println("setup shared data[0] = " +
> ToBeSharedData[0]);
>     31     }
>     32
>     33     public void map(Object key, Text value, Context context
>     34                     ) throws IOException, InterruptedException {
>     35       StringTokenizer itr = new StringTokenizer(value.toString());
>     36       while (itr.hasMoreTokens()) {
>     37         word.set(itr.nextToken());
>     38         context.write(word, one);
>     39       }
>     40       System.out.println("read shared data[0] = " +
> ToBeSharedData[0]);
>     41     }
>     42   }
>
> First, can you tell me how to make sure "jvm reuse" is taking effect, for I
> didn't see anything different from before. I use "top" command under linux
> and see the same number of java processes and same memory usage.
>
> Second, can you tell me how to make the "ToBeSharedData" be inited only
> once
> and can be read from other MapTasks on the same node? Or this is not a
> suitable programming style for map-reduce?
>
> By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a
> single-node.
> thanks in advance
>
> On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal <sharadag@yahoo-inc.com
> >wrote:
>
> >
> > snowloong wrote:
> > > Hi,
> > > I want to share some data structures for the map tasks on a same
> node(not
> > through files), I mean, if one map task has already initialized some data
> > structures (e.g. an array or a list), can other map tasks share these
> > memorys and directly access them, for I don't want to reinitialize these
> > datas and I want to save some memory. Can hadoop help me do this?
> >
> > You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks
> > in mapred-default.xml for usage. Then you can cache the data in a static
> > variable in your mapper.
> >
> > - Sharad
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message