Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of snickerdoodle08@gmail.com
 designates 209.85.217.13 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:in-reply-to:mime-version
         :content-type:references;
        b=JBiRHVDGDrn/IsyKv43g4K8l+aTZ19bJa1XQd6+a7gW1kacs7Ur6yuMhnOlYjlBCGQ
         HnxMg44Mnq5L/3joQfo7oA/AUfri6RA6xrDTZkKo964weXZEW4vLB/93ZS/UmeiNzBn7
         IuL3GC6aBXAxD7ukUDuw5GKI9Xd9CPkhUAsi0=
Message-ID: <257c70550808201043x2e645c1ajeb85f92a5594a9e9@mail.gmail.com>
Date: Wed, 20 Aug 2008 12:43:02 -0500
From: Sandy <snickerdoodle08@gmail.com>
To: core-user@hadoop.apache.org
Subject: Re: pseudo-global variable constuction
In-Reply-To: <48AB42D9.5010005@attributor.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_27150_7164446.1219254183001"
References: <257c70550808191356u68f00821g655af38a694aea17@mail.gmail.com>
	 <48AB42D9.5010005@attributor.com>

------=_Part_27150_7164446.1219254183001
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Thank you very much, Paco and Jason. It works!

For any users who may be curious what this may look like in code, here is a
small snippet of mine:

file: myLittleMRProgram.java
package.org.apache.hadoop.examples;

  public static class Reduce extends MapReduceBase implements Reducer<Text,
LongWritable, Text, LongWritable> {
        private int nTax = 0;

        public void configure(JobConf job) {
            super.configure(job);
            String Tax = job.get("nTax");
            nTax = Integer.parseInt(Tax);
        }

        public void reduce() throws IOException {
          ....
           System.out.println("nTax is: " + nTax);
        }
....
main() {
....
conf.set("nTax", other_args.get(2));
JobClient.runJob(conf);
....
return 0;
}
--------


-SM

On Tue, Aug 19, 2008 at 5:02 PM, Jason Venner <jason@attributor.com> wrote:

> Since the map & reduce tasks generally run in a separate java virtual
> machine and on distinct machines from your main task's java virtual machine,
> there is no sharing of variables between the main task and the map or reduce
> tasks.
>
> The standard way is to store the variable in the Configuration (or JobConf)
> object in your main task
> Then in the configure method of your map and reduce task class, extract the
> variable value from the JobConf object.
>
> You will need to implement an overriding to the configure method in your
> map and reduce classes.
>
> This will also require that the variable value be serializable.
>
> For lots of large variables this can be expensive.
>
>
> Sandy wrote:
>
>> Hello,
>>
>>
>> My M/R program is going smoothly, except for one small problem. I have a
>> "global" variable that is set by the user (and thus in the main function),
>> that I want one of my reduce functions to access. This is a read-only
>> variable. After some reading in the forums, I tried something like this:
>>
>> file: MyGlobalVars.java
>> package org.apache.hadoop.examples;
>> public class MyGlobalVars {
>>    static public int nTax;
>> }
>> ------
>>
>> file: myLittleMRProgram.java
>> package.org.apache.hadoop.examples;
>> map function() {
>>   System.out.println("in map function, nTax is: " + MyGlobalVars.nTax);
>> }
>> ....
>> main() {
>> MyGlobalVars.nTax = other_args.get(2);
>> System.out.println("in main function, nTax is: " + MyGlobalVars.nTax);
>> ....
>> JobClient.runJob(conf);
>> ....
>> return 0;
>> }
>> --------
>>
>> When I run it, I get:
>> in main function, nTax is 20 (which is what I want)
>> in map function, nTax is 0 (<--- this is not right).
>>
>>
>> I am a little confused on how to resolve this. I apologize in advance if
>> this is an blatant java error; I only began programming in the language a
>> few weeks ago.
>>
>> Since Map Reduce tries to avoid the whole shared-memory scene, I am more
>> than willing to have each reduce function receive a local copy of this
>> user
>> defined value. However, I am a little confused on what the best way to do
>> this would be. As I see it, my options are:
>>
>> 1.) write the user defined value to the hdfs in the main function, and
>> have
>> it read from the hdfs in the reduce function. I can't quite figure out the
>> code to this though. I know how to specify -an- input file for the map
>> reduce task, but if I did it this way, won't I need to specify two
>> separate
>> input files?
>>
>> 2. Put it in the construction of the reduce object (I saw this mentioned
>> in
>> the archives). How would I accomplish this exactly when the value is user
>> defined? Parameter Passing? If so, won't this require me changing the
>> underlying map reduce base (which makes me a touch nervous, since i'm
>> still
>> very new to hadoop).
>>
>> What would be the easiest way to do this?
>>
>> Thanks in advance for the help. I appreciate your time.
>>
>> -SM
>>
>>
>>
> --
> Jason Venner
> Attributor - Program the Web <http://www.attributor.com/>
> Attributor is hiring Hadoop Wranglers and coding wizards, contact if
> interested
>

------=_Part_27150_7164446.1219254183001--