hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From steven zhuang <steven.zhuang.1...@gmail.com>
Subject Re: Task attempt failed to report status
Date Sat, 06 Mar 2010 06:57:37 GMT
thanks, J.D, again.

      For my table some rows may has as much as several million cells, which
is why we choose Hbase over any RDB.

      And sorry I didn't notice that I pasted the tasktracker log in the
previous log, I am check the regionserver log, but I don't know which
regionserver will get the update request, so I am checking really slowly.

     In my impression, the date updating or deleting ops are all done in
mem-cache, so I really wonder how come it takes 10 min to finish an update
even if the region is really big? Im still checking the log, will update
this thread if I found something interesting.


On Sat, Mar 6, 2010 at 2:33 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> How many cells do you output normally per row? Looks like a lot the
> way you are doing it and it might cause problems.
>
> Also did you take a look at the region servers logs? Anything distress
> messages in there?
>
> J-D
>
> On Fri, Mar 5, 2010 at 1:12 AM, steven zhuang
> <steven.zhuang.1984@gmail.com> wrote:
> > hi, all,
> >
> >     when I import data into the HTable with a Map/Reduce job, the task
> runs
> > smoothly until the last reducer failed 6 times to report its status.
> >     In my program I use batchupdate to collect for every 1000 cells, and
> > update the status. I don't think the normal inserting will cost 10
> minutes,
> > coz the first 99% of the job was smoothly done, only the very last
> reducer
> > will get the "fail to report status" error.
> >
> >     I doubt the problem is caused by regoinserver is way too busy, which
> > causes the "output.collect(k, bu);" takes too much time to return. but I
> am
> > not sure coz I don't know which regionserver is actually committing the
> > update.
> >
> >    So which log should I dig into? any hint is appreciated.
> >
> >    My code's  submitting portion is as follows(just copied from some
> online
> > source and changed a little):
> >
> >        public void reduce(ImmutableBytesWritable k,
> >                Iterator<HbaseMapWritable<byte[], byte[]>> v,
> >                OutputCollector<ImmutableBytesWritable, BatchUpdate>
> output,
> >                Reporter r) throws IOException {
> >            while (v.hasNext()) {
> >                r.setStatus("Reducer begin committing row: " + new
> > String(k.get(), HConstants.UTF8_ENCODING) + "  Time:"+ new Date());
> >                BatchUpdate bu = new BatchUpdate(k.get());
> >                int cellCnt = 0;
> >                while (v.hasNext()) {
> >                    HbaseMapWritable<byte[], byte[]> hmw = v.next();
> >                    Iterator<Entry<byte[], byte[]>> iter =
> >  hmw.entrySet().iterator();
> >                    while(iter.hasNext()){
> >                        Entry<byte[], byte[]> e = iter.next();
> >                        bu.put(e.getKey(), e.getValue());
> >                        //System.err.println("now add cell: "+e+" cell
> > count: " + cellCnt + new Date());
> >                        if(++cellCnt>1000){
> > *                            output.collect(k, bu); //this line coz
> timeout.
> > *
> > *                            r.setStatus("Reducer done committing " + new
> > String(e.getKey(), HConstants.UTF8_ENCODING) + ":"+new
> String(e.getValue(),
> > HConstants.UTF8_ENCODING)+ "  Time:"+ new Date());*
> >                            bu = new BatchUpdate(k.get());
> >                            cellCnt = 0;
> >                        }
> >                    }
> >                }
> >                if(cellCnt>0){
> >                    output.collect(k, bu);
> >                }
> >            }
> >
> >
> > the regionserver log is as follows:
> >
> > 2010-03-05 08:24:07,268 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.33333334% reduce > sort
> > 2010-03-05 08:24:07,788 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6666667% reduce > reduce
> > 2010-03-05 08:24:10,795 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.66668% Reducer done committing
> > queries:tempurpedic complaints:3:3:3:3  Time:Fri Mar 05 08:24:10 GMT 2010
> >
> > reduce
> > 2010-03-05 08:24:13,806 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:modeling agencies:1:1:1:1  Time:Fri Mar 05 08:24:13 GMT 2010 >
> > reduce
> > 2010-03-05 08:24:16,810 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:polaris accessories:2:2:2:2  Time:Fri Mar 05 08:24:16 GMT 2010 >
> > reduce
> > 2010-03-05 08:24:19,832 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:what was martin luther king jr s pants size:1:1:1:1  Time:Fri Mar
> 05
> > 08:24:19 GMT 2010 > reduce
> > 2010-03-05 08:24:22,838 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:nj npr stations:2:2:2:2  Time:Fri Mar 05 08:24:22 GMT 2010 >
> reduce
> > 2010-03-05 08:24:25,842 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:stacked hair gallery:5:5:5:5  Time:Fri Mar 05 08:24:25 GMT 2010 >
> > reduce
> > 2010-03-05 08:24:28,877 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:school bus endorsement test:2:2:2:2  Time:Fri Mar 05 08:24:28 GMT
> > 2010 > reduce
> > 2010-03-05 08:24:31,919 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:spectacular from pretty ricky pictures:2:2:2:2  Time:Fri Mar 05
> > 08:24:31 GMT 2010 > reduce
> > 2010-03-05 08:24:34,926 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.6714777% Reducer done committing
> > queries:sample tax return:2:2:2:2  Time:Fri Mar 05 08:24:34 GMT 2010 >
> > reduce
> > 2010-03-05 08:24:37,929 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.7544105% Reducer begin committing
> > row: cp_2010/01/18  Time:Fri Mar 05 08:24:37 GMT 2010 > reduce
> > 2010-03-05 08:24:40,933 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0 0.7544105% Reducer begin committing
> > row: cp_2010/01/18  Time:Fri Mar 05 08:24:37 GMT 2010 > reduce
> > 2010-03-05 08:34:43,448 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201002112137_0858_r_000066_0: Task
> > attempt_201002112137_0858_r_000066_0 failed to report status for 602
> > seconds. Killing!
> > 2010-03-05 08:34:43,458 INFO org.apache.hadoop.mapred.TaskTracker:
> Process
> > Thread Dump: lost task
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message