Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E035DF1F for ; Mon, 5 Nov 2012 14:28:32 +0000 (UTC) Received: (qmail 88435 invoked by uid 500); 5 Nov 2012 14:14:53 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 86775 invoked by uid 500); 5 Nov 2012 14:14:08 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 85852 invoked by uid 99); 5 Nov 2012 14:13:42 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2012 14:13:42 +0000 Received: from localhost (HELO mail-ie0-f169.google.com) (127.0.0.1) (smtp-auth username vines, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2012 14:13:40 +0000 Received: by mail-ie0-f169.google.com with SMTP id 10so9791991ied.0 for ; Mon, 05 Nov 2012 06:13:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.50.242.74 with SMTP id wo10mr9452954igc.51.1352124819883; Mon, 05 Nov 2012 06:13:39 -0800 (PST) Reply-To: vines@apache.org Received: by 10.64.26.136 with HTTP; Mon, 5 Nov 2012 06:13:39 -0800 (PST) Received: by 10.64.26.136 with HTTP; Mon, 5 Nov 2012 06:13:39 -0800 (PST) In-Reply-To: References: Date: Mon, 5 Nov 2012 09:13:39 -0500 Message-ID: Subject: RE: Accumulo Map Reduce is not distributed From: John Vines To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=f46d044787d394c61404cdc017d3 --f46d044787d394c61404cdc017d3 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable So it sounds like the job was correctly set to 4 mappers and your issue is in your MapReduce configuration. I would check the jobtracker page and verify the number of map slots, as well as how they're running, as print statements are not the most accurate in the framework. Sent from my phone, pardon the typos and brevity. On Nov 5, 2012 8:59 AM, "Cornish, Duane C." wrote: > Hi William,**** > > ** ** > > Thanks for helping me out and sorry I didn=92t get back to you sooner, I = was > away for the weekend. I am only callying ToolRunner.run once.**** > > ** ** > > *public* *static* *void* ExtractFeaturesFromNewImages() *throws*Exception= { > **** > > String[] parameters =3D *new* String[1];**** > > parameters[0] =3D "foo";**** > > *InitializeFeatureExtractor*();**** > > ToolRunner.*run*(CachedConfiguration.*getInstance*(), *new*= Accumulo_FE_MR_Job(), parameters); > **** > > }**** > > ** ** > > Another indicator that I=92m only calling it once is that before I was > pre-splitting the table, I was just getting one larger map-reduce job wit= h > only 1 mapper. Based on my print statements, the job was running in > sequence (which I guess makes sense because the table only existed on one > node in my cluster. Then after pre-splitting my table, I was getting one > job that had 4 mappers. Each was running one after the other. I hadn=92= t > changed any code (other than adding in the splits). So, I=92m only calli= ng > ToolRunner.run once. Furthermore, my run function in my job class is > provided below:**** > > ** ** > > @Override**** > > *public* *int* run(String[] arg0) *throws* Exception { **** > > runOneTable();**** > > *return* 0;**** > > }**** > > ** ** > > Thanks,**** > > Duane**** > > *From:* William Slacum [mailto:wilhelm.von.cloud@accumulo.net] > *Sent:* Friday, November 02, 2012 8:48 PM > *To:* user@accumulo.apache.org > *Subject:* Re: Accumulo Map Reduce is not distributed**** > > ** ** > > What about the main method that calls ToolRunner.run? If you have 4 jobs > being created, then you're calling run(String[]) or runOneTable() 4 times= . > **** > > On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. < > Duane.Cornish@jhuapl.edu> wrote:**** > > Thanks for the prompt response John!**** > > When I say that I=92m pre-splitting my table, I mean I am using the > tableOperations().addSplits(table,splits) command. I have verified that > this is correctly splitting my table into 4 tablets and it is being > distributed across my cloud before I start my map reduce job.**** > > **** > > Now, I only kick off the job once, but it appears that 4 separate jobs ru= n > (one after the other). The first one reaches 100% in its map phase (and > based on my output only handled =BC of the data), then the next job start= s at > 0% and reaches 100%, and so on. So I think I=92m =93only running one map= per > at a time in an MR job that has 4 mappers total.=94. I have 2 mapper slo= ts > per node. My hadoop is set up so that one machine is the namenode and th= e > other 3 are datanodes. This gives me 6 slots total. (This is not > congruent to my accumulo where the master is also a slave =96 giving 4 to= tal > slaves). **** > > **** > > My map reduce job is not a chain job, so all 4 tablets should be able to > run at the same time.**** > > **** > > Here is my job class code below:**** > > **** > > *import* org.apache.accumulo.core.security.Authorizations;**** > > *import* org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat;*= * > ** > > *import* org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat= ; > **** > > *import* org.apache.hadoop.conf.Configured;**** > > *import* org.apache.hadoop.io.DoubleWritable;**** > > *import* org.apache.hadoop.io.Text;**** > > *import* org.apache.hadoop.mapreduce.Job;**** > > *import* org.apache.hadoop.util.Tool;**** > > *import* org.apache.log4j.Level;**** > > **** > > **** > > *public* *class* Accumulo_FE_MR_Job *extends* Configured *implements*Tool= { > **** > > **** > > *private* *void* runOneTable() *throws* Exception {**** > > System.*out*.println("Running Map Reduce Feature Extraction Job")= ; > **** > > **** > > Job job =3D *new* Job(getConf(), getClass().getName());**** > > **** > > job.setJarByClass(getClass());**** > > job.setJobName("MRFE");**** > > **** > > job.setInputFormatClass(AccumuloRowInputFormat.*class*);**** > > AccumuloRowInputFormat.*setZooKeeperInstance* > (job.getConfiguration(),**** > > HMaxConstants.*INSTANCE*,**** > > HMaxConstants.*ZOO_SERVERS*);**** > > **** > > AccumuloRowInputFormat.*setInputInfo*(job.getConfiguration(),**** > > HMaxConstants.*USER*, **** > > HMaxConstants.*PASSWORD*.getBytes(), **** > > HMaxConstants.*FEATLESS_IMG_TABLE*,**** > > *new* Authorizations());**** > > **** > > AccumuloRowInputFormat.*setLogLevel*(job.getConfiguration(), > Level.*FATAL*);**** > > **** > > job.setMapperClass(AccumuloFEMapper.*class*);**** > > job.setMapOutputKeyClass(Text.*class*);**** > > job.setMapOutputValueClass(DoubleWritable.*class*);**** > > **** > > job.setNumReduceTasks(4);**** > > job.setReducerClass(AccumuloFEReducer.*class*);**** > > job.setOutputKeyClass(Text.*class*);**** > > job.setOutputValueClass(Text.*class*);**** > > **** > > job.setOutputFormatClass(AccumuloOutputFormat.*class*);**** > > AccumuloOutputFormat.*setZooKeeperInstance* > (job.getConfiguration(),**** > > HMaxConstants.*INSTANCE*,**** > > HMaxConstants.*ZOO_SERVERS*);**** > > AccumuloOutputFormat.*setOutputInfo*(job.getConfiguration(),**** > > HMaxConstants.*USER*,**** > > HMaxConstants.*PASSWORD*.getBytes(),**** > > *true*,**** > > HMaxConstants.*ALL_IMG_TABLE*);**** > > **** > > AccumuloOutputFormat.*setLogLevel*(job.getConfiguration(), Level.= * > FATAL*);**** > > **** > > job.waitForCompletion(*true*);**** > > *if* (job.isSuccessful()) {**** > > System.*err*.println("Job Successful");**** > > } *else* {**** > > System.*err*.println("Job Unsuccessful");**** > > }**** > > }**** > > **** > > @Override**** > > *public* *int* run(String[] arg0) *throws* Exception {**** > > runOneTable();**** > > *return* 0;**** > > }**** > > }**** > > **** > > Thanks,**** > > Duane**** > > **** > > *From:* John Vines [mailto:vines@apache.org] > *Sent:* Friday, November 02, 2012 5:04 PM > *To:* user@accumulo.apache.org > *Subject:* Re: Accumulo Map Reduce is not distributed**** > > **** > > This sounds like an issue with how your MR environment is configured > and/or how you're kicking off your mapreduce. > > Accumulo's input formats with automatically set the number of mappers to > the number of tablets you have, so you should have seen your job go from = 1 > mapper to 4. What you describe is you now do 4 MR jobs instead of just on= e, > is that correct? Because that doesn't make a lot of sense, unless by > presplitting your table you meant you now have 4 different support tables= . > Or do you mean that you're only running one mapper at a time in an MR job > that has 4 mappers total? > > I believe it's somewhere in your kickoff that things may be a bit > misconstrued. Just so I'm clear, how many mapper slots do you have per > node, is your job a chain MR job, and do you mind sharing your code which > sets up and kicks off your MR job so I have an idea of what could be > kicking off 4 jobs. > > John**** > > **** > > On Fri, Nov 2, 2012 at 4:53 PM, Cornish, Duane C. < > Duane.Cornish@jhuapl.edu> wrote:**** > > Hello,**** > > **** > > I apologize if this discuss should be directed to a hadoop map reduce > forum, however, I have some concern that my problem may be with my use of > accumulo. **** > > **** > > I have a map reduce job that I want to run over data in a table. I have > an index table and a support table which contains a subset of the data in > the index table. I would like to map reduce over the support table on my > small 4 node cluster. **** > > **** > > I have written a map reduce job that uses the AccumuloRowInputFormat clas= s > and sets the support table as its input table.**** > > **** > > In my mapper, I read in a row of the support table, and make a call to a > static function which pulls information out of the index table. Next, I > use the data pulled back from the function call as input to a call to an > external .so file that is stored on the name node. I then make another > static function call to ingest the new data back into the index table. (= I > know I could emit this in the reduce step, but what I=92m ingesting is > formatted in a somewhat complex java object and I already had a static > function that ingested it the way I needed it.) My reduce step is > completely empty.**** > > **** > > I output print statements from my mapper to see my progress. The problem > that I=92m getting is that my entire job appears to run in sequence not i= n > parallel. I am running it from the accumulo master on the 4 node system. > **** > > **** > > I realized that my support table is very small and was not being split > across any tables. I am now presplitting this table across all 4 nodes. > Now, when I run the map reduce job it appears that 4 separate map reduce > jobs run one after each other. The first map reduce job runs, gets to > 100%, then the next map reduce job runs, etc. The job is only called onc= e, > why are there 4 jobs running? Why won=92t these jobs run in parallel?***= * > > **** > > Is there any way to set the number of tasks that can run? This is > possible from the hadoop command line, is it possible from the java API? > Also, could my problem stem from the fact that during my mapper I am maki= ng > static function calls to another class in my java project, accessing my > accumulo index table, or making a call to an exteral .so library? I coul= d > restructure the job to avoid making static function calls and I could wri= te > directly to the Accumulo table from my map reduce job if that would fix m= y > problem. I can=92t avoid making the external .so library call. Any help > would be greatly appreciated. **** > > **** > > Thanks,**** > > Duane**** > > **** > > ** ** > --f46d044787d394c61404cdc017d3 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable

So it sounds like the job was correctly set to 4 mappers and= your issue is in your MapReduce configuration. I would check the jobtracke= r page and verify the number of map slots, as well as how they're runni= ng, as print statements are not the most accurate in the framework.

Sent from my phone, pardon the typos and brevity.

On Nov 5, 2012 8:59 AM, "Cornish, Duane C.&= quot; <Dua= ne.Cornish@jhuapl.edu> wrote:

Hi William,

=A0

Thanks for helping me out and sorry I didn=92= t get back to you sooner, I was away for the weekend.=A0 I am only callying= ToolRunner.run once.

=A0<= /p>

public static void ExtractFeaturesFromNewImages= () throws Exception{<= u>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Str= ing[] parameters =3D new String[1];

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 par= ameters[0] =3D "foo";

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = InitializeFeatureExtractor();

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Too= lRunner.run(CachedConfiguration.getInstance(), new<= /b> Accumulo_FE_MR_Jo= b(), parameters);

=A0=A0=A0=A0=A0=A0 }

=A0

Another indicator that I=92m only calling it = once is that before I was pre-splitting the table, I was just getting one l= arger map-reduce job with only 1 mapper.=A0 Based on my print statements, t= he job was running in sequence (which I guess makes sense because the table= only existed on one node in my cluster.=A0 Then after pre-splitting my tab= le, I was getting one job that had 4 mappers.=A0 Each was running one after= the other.=A0 I hadn=92t changed any code (other than adding in the splits= ). =A0So, I=92m only calling ToolRunner.run once.=A0 Furthermore, my run fu= nction in my job class is provided below:

=A0<= /p>

=A0=A0=A0=A0=A0=A0 @Override

=A0=A0=A0=A0=A0=A0 public int run(String[] arg0) <= b>throw= s Exceptio= n {=A0=A0=A0=A0=A0=A0=A0

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 run= OneTable();=

=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 return 0;

=A0=A0=A0=A0=A0=A0 }

=A0

Thanks,

Duane

From: Wil= liam Slacum [mailto:wilhelm.von.cloud@accumulo.net]
Sent: Friday, November 02, 2012 8:48 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Map Reduce is not distributed

=A0

What about the main method that calls ToolRunner.run= ? If you have 4 jobs being created, then you're calling run(String[]) o= r runOneTable() 4 times.

On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane = C. <Duane.= Cornish@jhuapl.edu> wrote:

Thanks for the prompt response John!

When I say t= hat I=92m pre-splitting my table, I mean I am using the tableOperations().a= ddSplits(table,splits) command.=A0 I have verified that this is correctly s= plitting my table into 4 tablets and it is being distributed across my clou= d before I start my map reduce job.

=A0<= /p>

Now, I only kick off t= he job once, but it appears that 4 separate jobs run (one after the other).= =A0 The first one reaches 100% in its map phase (and based on my output onl= y handled =BC of the data), then the next job starts at 0% and reaches 100%= , and so on.=A0 So I think I=92m =93only running one mapper at a tim= e in an MR job that has 4 mappers total.=94.=A0 = I have 2 mapper slots per node.=A0 My hadoop is set up so that one machine = is the namenode and the other 3 are datanodes.=A0 This gives me 6 slots tot= al.=A0 (This is not congruent to my accumulo where the master is also a sla= ve =96 giving 4 total slaves).=A0

=A0<= /p>

My map reduce job is n= ot a chain job, so all 4 tablets should be able to run at the same time.

=A0<= /p>

Here is my job class c= ode below:

=A0

import org.apache.accumulo.core.security.Authorizat= ions;

import org.apache.accumulo.core.clie= nt.mapreduce.AccumuloOutputFormat;

import org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat;

import org.apache.hadoop.conf.Config= ured;

import org.apache.hadoop.io.DoubleWr= itable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.J= ob;

import org.apache.hadoop.util.Tool;<= /span>

import org.apache.log4j.Level;

=A0

=A0

public class Accumulo_FE_MR_Job ex= tends Configured implements Tool{

=A0=A0=A0=A0=A0=A0

=A0=A0=A0=A0=A0=A0 private void runOneTable() throws Excepti= on {

=A0=A0=A0=A0=A0=A0=A0 System.out.println(&qu= ot;Running Map Reduce Feature Extraction Job");=A0=A0=A0=A0=A0= =A0

=A0

=A0=A0=A0=A0=A0=A0=A0 Job job=A0 =3D new Job(getConf(), getClass().getName());<= /u>

=A0

=A0=A0=A0=A0=A0=A0=A0 job.setJarByClass(getClass());

=A0=A0=A0=A0=A0=A0=A0 job.setJobName("MRFE");

<= p class=3D"MsoNormal" style=3D"text-autospace:none"> =A0<= /u>

=A0=A0=A0=A0=A0=A0=A0 job.setInputFo= rmatClass(AccumuloRowInputFormat.<= b>class);

=A0=A0=A0=A0=A0=A0=A0 AccumuloRowInputFormat.setZooKeeperInstance(job.= getConfiguration(),

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 HMaxConstants.INSTANCE,

=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 HMaxConstants.ZOO= _SERVERS);

=A0

=A0=A0=A0=A0=A0=A0=A0 AccumuloRowInputFormat.setInputInfo(job.getConfi= guration(),

=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0 HMaxConstants.USER<= /i>,

=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0HMaxConstants.P= ASSWORD.getBytes(),

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0HMaxConstants.= FEATLESS_IMG_TABLE,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 new Authorizations());

=A0=A0=A0=A0=A0=A0=A0=

=A0=A0=A0=A0=A0= =A0=A0=A0AccumuloRowInputFormat.setLogLevel(job.getConfiguration(), Level.FATAL);

=A0

=A0=A0=A0=A0=A0=A0=A0 job.setMapperClass(AccumuloFEMapper.= class);

=A0=A0=A0=A0=A0=A0=A0 job.setMapOutputKeyCl= ass(Text.class);<= u>

=A0=A0=A0=A0=A0=A0=A0= job.setMapOutputValueClass(DoubleWritable.class);

=A0<= /u>

=A0=A0=A0=A0=A0=A0=A0 job.setNumRedu= ceTasks(4);

=A0=A0=A0=A0=A0=A0=A0 job.setReducerClass(A= ccumuloFEReducer.class);=

=A0=A0=A0=A0=A0=A0=A0 job.setOutputKeyClass= (Text.class);<= /u>

=A0=A0=A0=A0=A0=A0=A0= job.setOutputValueClass(Text.class= );

=A0

=A0=A0=A0=A0=A0=A0=A0 job.setOutputFormatCl= ass(AccumuloOutputFormat.class)= ;

=A0=A0=A0=A0=A0=A0=A0 AccumuloOutputFormat.= setZooKeeperInstance(job.getConfiguration(),

=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 HMaxConstants.INSTANCE,

=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 HMax= Constants.ZOO_SERVERS);<= u>

=A0=A0=A0=A0=A0=A0=A0 AccumuloOutputFormat.= setOutputInfo(job.getConfiguration(),

=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 HMaxConstants.USER,

=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 HMaxCons= tants.PASSWORD.getBytes(),

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 true,<= /p>

=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 HMaxConstants.ALL= _IMG_TABLE);

=A0

=A0=A0=A0=A0=A0=A0=A0 AccumuloOutputFormat.= setLogLevel(job.getConfiguration(), Level.FATAL);

=A0

=A0=A0=A0=A0=A0=A0=A0 job.waitForCompletion(true);

=A0=A0=A0=A0=A0=A0=A0 if (job.isSuccessful()) {

=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 System.err.println= ("Job Successful");

=A0=A0=A0=A0=A0=A0=A0= } else {<= /p>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 System= .err.println("Job Unsuccessful");

=A0=A0=A0=A0=A0=A0=A0 }

=A0=A0=A0=A0 }

=A0=A0=A0=A0=A0=A0

=A0=A0=A0=A0=A0=A0 @Override

=A0=A0=A0=A0=A0=A0 public int = run(String[] arg0) throws Excep= tion {

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 run= OneTable();

=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 return 0;

=A0=A0=A0=A0=A0=A0 }

}

=A0<= /p>

Thanks,<= u>

Duane

=A0=

From: John Vin= es [mailto:vines@apac= he.org]
Sent: Friday, November 02, 2012 5:04 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Map Reduce is not distributed

=A0

This sounds like an issue with how your MR environment is configured and/= or how you're kicking off your mapreduce.

Accumulo's input f= ormats with automatically set the number of mappers to the number of tablet= s you have, so you should have seen your job go from 1 mapper to 4. What yo= u describe is you now do 4 MR jobs instead of just one, is that correct? Be= cause that doesn't make a lot of sense, unless by presplitting your tab= le you meant you now have 4 different support tables. Or do you mean that y= ou're only running one mapper at a time in an MR job that has 4 mappers= total?

I believe it's somewhere in your kickoff that things may be a bit m= isconstrued. Just so I'm clear, how many mapper slots do you have per n= ode, is your job a chain MR job, and do you mind sharing your code which se= ts up and kicks off your MR job so I have an idea of what could be kicking = off 4 jobs.

John

=A0

On Fri, Nov 2, 2= 012 at 4:53 PM, Cornish, Duane C. <Duane.Cornish@jhuapl.edu> wrote:<= /u>

Hello,

=A0

I apologize if this discus= s should be directed to a hadoop map reduce forum, however, I have some con= cern that my problem may be with my use of accumulo.=A0

=A0

I have a= map reduce job that I want to run over data in a table.=A0 I have an index= table and a support table which contains a subset of the data in the index= table.=A0 I would like to map reduce over the support table on my small 4 = node cluster.=A0

=A0

I have w= ritten a map reduce job that uses the AccumuloRowInputFormat class and sets= the support table as its input table.

=A0

In my mapper, I read in a row o= f the support table, and make a call to a static function which pulls infor= mation out of the index table.=A0 Next, I use the data pulled back from the= function call as input to a call to an external .so file that is stored on= the name node.=A0 I then make another static function call to ingest the n= ew data back into the index table. =A0(I know I could emit this in the redu= ce step, but what I=92m ingesting is formatted in a somewhat complex java o= bject and I already had a static function that ingested it the way I needed= it.)=A0 My reduce step is completely empty.

=A0

I output= print statements from my mapper to see my progress.=A0 The problem that I= =92m getting is that my entire job appears to run in sequence not in parall= el.=A0 I am running it from the accumulo master on the 4 node system.=A0

=A0

I realiz= ed that my support table is very small and was not being split across any t= ables.=A0 I am now presplitting this table across all 4 nodes.=A0 Now, when= I run the map reduce job it appears that 4 separate map reduce jobs run on= e after each other.=A0 The first map reduce job runs, gets to 100%, then th= e next map reduce job runs, etc.=A0 The job is only called once, why are th= ere 4 jobs running?=A0 Why won=92t these jobs run in parallel?

=A0

Is there= any way to set the number of tasks that can run?=A0 This is possible from = the hadoop command line, is it possible from the java API? Also, could my p= roblem stem from the fact that during my mapper I am making static function= calls to another class in my java project, accessing my accumulo index tab= le, or making a call to an exteral .so library?=A0 I could restructure the = job to avoid making static function calls and I could write directly to the= Accumulo table from my map reduce job if that would fix my problem.=A0 I c= an=92t avoid making the external .so library call.=A0 Any help would be gre= atly appreciated.=A0

=A0

Thanks,<= u>

Duane

=

=A0

<= /div>

=A0

--f46d044787d394c61404cdc017d3--