hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Got a problem using Hbase as a MR sink - program hangs in the reduce part
Date Fri, 02 Oct 2009 04:54:27 GMT
On Thu, Oct 1, 2009 at 9:30 PM, Taylor, Ronald C <ronald.taylor@pnl.gov>wrote:

> 1) yes, we are planning on switching to 0.20. Just haven't yet. So -
> that might be the first thing to do.
>
>
If you can, please do.  Big difference and 0.19.x is oh so six months old by
now.



> 2) re the # of reducers: at the start of my run fn, just after defining
> a jobConf object, I do a
>    jobConf.setNumReduceTasks(2)
>


>
> Wasn't sure if that setting was per node or for the entire 10-node
> cluster, so I also tried
>    jobConf.setNumReduceTask(19)
>
> Didn't make any difference - program still failed at 66%
>
>
How many are running when you change the above?  2 in first case and 19 in
second?

Its a small table?  Or a new table?  They might be beating up on one
region.  Better in 0.20.0.


>
>
> 4) re the debugging suggestions: noted, and I'll see what I can do.
>
> Thanks for the quick reply. I leave on a trip tomorrow morn, back next
> Thursday, so - I'll be working on this as soon as I get back.
>

Good stuff.  We'll be here when you get back Thurs.

Go easy,
St.Ack


>  Ron
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> stack
> Sent: Thursday, October 01, 2009 9:14 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Got a problem using Hbase as a MR sink - program hangs in
> the reduce part
>
> Can you run 0.20.0?
>
> 66% is when it starts writing hbase.
>
> How many reducers?
>
> Enable DEBUG (see FAQ for how).
>
> These are odd in that they are saying that the reduce task was dead --
> no progress reported -- over ten whole minutes:
>
> attempt_200908131056_0004_r_
> >
> > 000000_1 failed to report status for 603 seconds. Killing!
>
>
>
> Can you find that task in the MapReduce UI and see what was going on?
>
> You've read the 'Getting Started' where it talks about upping file
> descriptors, xceivers, and applying the HDFS-127 patch to your hadoop
> cluster?
>
> Yours,
> St.Ack
>
>
>
> On Thu, Oct 1, 2009 at 5:24 PM, Taylor, Ronald C
> <ronald.taylor@pnl.gov>wrote:
>
> >
> >  Hi folks,
> >
> > I am trying to run a simple MapReduce program that sums the number of
> > entries in a list in a column in an Hbase table and then places that
> > sum back into the table. Simple task, in theory - I am just trying out
>
> > MapReduce programming combined with Hbase use, i.e., using an Hbase
> > table as a data source and as a sink for the output.
> >
> > So - I get the screen error output below. The program fails at 66%
> > into reduce. Don't know why - I have rerun it and it fails at the same
> point.
> > I am doing this on a 10-node Linux cluster using Hadoop 0.19.1 and
> > Hbase 0.19.3.
> >
> > I don't see any clues in the master Hbase and Hadoop logs. There are
> > no errors are reported that I can see - though I cheerfully admit to
> > being a complete novice at interpreting the log output.
> >
> > I'm hoping this is something simple - perhaps some parameter I forgot
> > to set? I am hoping the screen output below might provide guidance to
> > somebody with more experience. Could very much use some help.
> >
> >  - Ron Taylor
> >
> > ___________________________________________
> > Ronald Taylor, Ph.D.
> > Computational Biology & Bioinformatics Group Pacific Northwest
> > National Laboratory
> > 902 Battelle Boulevard
> > P.O. Box 999, MSIN K7-90
> > Richland, WA  99352 USA
> > Office:  509-372-6568
> > Email: ronald.taylor@pnl.gov
> > www.pnl.gov
> >
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >
> > Working in this directory:
> >
> > hadoop@neptune:/share/apps/RonWork/MR
> >
> > Command issued:
> >
> > /share/apps/hadoop/hadoop-0.19.1/bin/hadoop jar
> > jarredBinTableMRSummation.jar binTableMRSummation
> >
> > Screen output:
> >
> > 09/10/01 16:24:53 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the
> same.
> > 09/10/01 16:24:53 INFO mapred.TableInputFormatBase: split:
> > 0->compute-0-0.local:,
> > 09/10/01 16:24:54 INFO mapred.JobClient: Running job:
> > job_200908131056_0004
> > 09/10/01 16:24:55 INFO mapred.JobClient:  map 0% reduce 0%
> > 09/10/01 16:25:27 INFO mapred.JobClient:  map 100% reduce 0%
> > 09/10/01 16:25:38 INFO mapred.JobClient:  map 100% reduce 33%
> > 09/10/01 16:25:43 INFO mapred.JobClient:  map 100% reduce 66%
> > 09/10/01 16:35:40 INFO mapred.JobClient:  map 100% reduce 33%
> > 09/10/01 16:35:41 INFO mapred.JobClient: Task Id :
> > attempt_200908131056_0004_r_000000_0, Status : FAILED Task
> > attempt_200908131056_0004_r_000000_0 failed to report status for 603
> > seconds. Killing!
> > 09/10/01 16:35:46 INFO mapred.JobClient:  map 100% reduce 0%
> > 09/10/01 16:35:46 INFO mapred.JobClient: Task Id :
> > attempt_200908131056_0004_r_000001_0, Status : FAILED Task
> > attempt_200908131056_0004_r_000001_0 failed to report status for 602
> > seconds. Killing!
> > 09/10/01 16:35:51 INFO mapred.JobClient:  map 100% reduce 33%
> > 09/10/01 16:35:56 INFO mapred.JobClient:  map 100% reduce 66%
> > 09/10/01 16:45:55 INFO mapred.JobClient:  map 100% reduce 33%
> > 09/10/01 16:45:55 INFO mapred.JobClient: Task Id :
> > attempt_200908131056_0004_r_000000_1, Status : FAILED Task
> > attempt_200908131056_0004_r_000000_1 failed to report status for 603
> > seconds. Killing!
> > 09/10/01 16:45:55 INFO mapred.JobClient: Task Id :
> > attempt_200908131056_0004_r_000000_2, Status : FAILED Task
> > attempt_200908131056_0004_r_000000_2 failed to report status for 603
> > seconds. Killing!
> > 09/10/01 16:46:00 INFO mapred.JobClient: Task Id :
> > attempt_200908131056_0004_r_000001_1, Status : FAILED Task
> > attempt_200908131056_0004_r_000001_1 failed to report status for 603
> > seconds. Killing!
> > 09/10/01 16:46:06 INFO mapred.JobClient: Task Id :
> > attempt_200908131056_0004_r_000001_2, Status : FAILED Task
> > attempt_200908131056_0004_r_000001_2 failed to report status for 603
> > seconds. Killing!
> >
> > <manually killed via control-C at this point>
> >
> > 09/10/01 16:46:15 INFO mapred.JobClient:  map 100% reduce 66% Killed
> > by signal 2.
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message