hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stchu <stchu.cl...@gmail.com>
Subject Re: problem: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
Date Mon, 10 Aug 2009 09:27:57 GMT
Hi, J-D

My Hadoop version is 0.19.1 and HBase version is 0.19.3.
The cluster is consist of #1 master and #3 slaves with intel quad-core cpu
and 4GB RAM (32 bits Ubuntu8.10 server)
I tried import 2 dataset: 10GB Data (about 300,000,000 rows) and under 1GB
(30,000,000 rows) Data, but both of them fails.
The error messages are the same.
The hbase master log not shows error message but the jobtracker logs some
errors:
----------------------------------------------------------------------------------------------------------
...
2009-08-06 18:09:30,376 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_200908051649_0004_m_000149_0' has completed
task_200908051649_0004_m_000149 successfully.
2009-08-06 18:09:30,376 INFO org.apache.hadoop.mapred.ResourceEstimator:
measured blowup on task_200908051649_0004_m_000149 was 365348217/65533900 =
5.574950018234837
2009-08-06 18:09:30,376 INFO org.apache.hadoop.mapred.ResourceEstimator: new
estimate is blowup = 5.370451475418986
2009-08-06 18:09:42,707 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_200908051649_0004_m_000139_0' has completed
task_200908051649_0004_m_000139 successfully.
2009-08-06 18:09:42,707 INFO org.apache.hadoop.mapred.ResourceEstimator:
measured blowup on task_200908051649_0004_m_000139 was 350695808/65533900 =
5.351364835604168
2009-08-06 18:09:42,707 INFO org.apache.hadoop.mapred.ResourceEstimator: new
estimate is blowup = 5.370335093468896
2009-08-06 18:10:08,608 INFO org.apache.hadoop.mapred.JobTracker: Removed
completed task 'attempt_200908051649_0004_m_000162_0' from
'tracker_ITeS-clusterPC2:localhost/127.0.0.1:47481'
2009-08-06 18:14:42,073 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_200908051649_0004_r_000003_0:
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:115)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:98)
        at
org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
        at org.apache.hadoop.mapred.Child.main(Child.java:158)
...
------------------------------------------------------------------------------------------------------

Would you please tell me what's the problem and how to resolve it?
Thanks a lot!

chu




2009/8/9 Jean-Daniel Cryans <jdcryans@apache.org>

> chu,
>
> Which version of HBase is this? How much data are you writing? What's
> your cluster like (#nodes, hardware, etc)?
>
> It would also be useful to see your master log around where it began
> to fail, you should see exceptions showing up.
>
> Thx,
>
> J-D
>
> On Sat, Aug 8, 2009 at 11:25 PM, stchu<stchu.cloud@gmail.com> wrote:
> > Hi,
> >
> > I tried to write the large amount of data into HBase. The map processes
> > could be completed, but while processing the reduce tasks, the job is
> > failed.
> > The error message is shown as follow:
> > ------------------------------------------------------------------------
> > ...
> > 09/08/07 19:45:47 INFO mapred.JobClient:  map 100% reduce 5%
> > 09/08/07 19:46:21 INFO mapred.JobClient:  map 100% reduce 6%
> > 09/08/07 19:46:27 INFO mapred.JobClient:  map 100% reduce 5%
> > java.io.IOException: Job failed!
> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> >    at icl.atc.ites.hbase.PIndexCreator.run(PIndexCreator.java:571)
> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >    at icl.atc.ites.hbase.PIndexCreator.main(PIndexCreator.java:598)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> >    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> >
> > And the error message on web ui:
> >
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
> >        at org.apache.hadoop.hbase.client.HTable.(HTable.java:115)
> >        at org.apache.hadoop.hbase.client.HTable.(HTable.java:98)
> >        at
> org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> >
> >
> > -------------------------------------------------------------------
> > And I checked both the userlog and the tasktracker log in one of the
> machine
> > (these two files are located at the machine #4 in my cluster):
> > Userlog:
> >
> -------------------------------------------------------------------------------------------------
> > ...
> >
> > 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to
> > the last merge-pass, with 16 segments left of total size: 163329244
> > bytes
> > 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 15322761 bytes (15322765 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000155_0
> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory
> > complete. Local file is
> >
> /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out
> > of size 163329218
> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for
> > merging on-disk files
> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 15322761 bytes from map-output for
> > attempt_200908051649_0004_m_000155_0
> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 14110457 bytes (14110461 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000153_0
> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 14110457 bytes from map-output for
> > attempt_200908051649_0004_m_000153_0
> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 13510680 bytes (13510684 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000151_0
> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 13510680 bytes from map-output for
> > attempt_200908051649_0004_m_000151_0
> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 14813689 bytes (14813693 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000154_0
> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 14813689 bytes from map-output for
> > attempt_200908051649_0004_m_000154_0
> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > Closed ram manager
> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > Interleaved on-disk merge complete: 13 files left.
> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > In-memory merge complete: 5 files left.
> > 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask:
> > Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk
> > merge
> > 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask:
> > Merging 13 files, 2343086626 bytes from disk
> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask:
> > Merging 0 segments, 0 bytes from memory into reduce
> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging
> > 18 sorted segments
> > 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging
> > 9 intermediate segments out of a total of 18
> > 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to
> > the last merge-pass, with 10 segments left of total size: 2343086558
> > bytes
> > 2009-08-06 18:29:04,667 ERROR
> > org.apache.hadoop.hbase.mapred.TableOutputFormat:
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> > 2009-08-06 18:29:04,699 WARN org.apache.hadoop.mapred.TaskTracker:
> > Error running child
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
> >        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
> >        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:115)
> >        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:98)
> >        at
> org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> >
> > 2009-08-06 18:29:04,716 INFO org.apache.hadoop.mapred.TaskRunner:
> > Runnning cleanup for the task
> >
> >
> >
> ------------------------------------------------------------------------------------------------------------
> >
> >  And tasklog:
> >
> > 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to
> > the last merge-pass, with 16 segments left of total size: 163329244
> > bytes
> > 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 15322761 bytes (15322765 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000155_0
> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory
> > complete. Local file is
> >
> /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out
> > of size 163329218
> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for
> > merging on-disk files
> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 15322761 bytes from map-output for
> > attempt_200908051649_0004_m_000155_0
> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 14110457 bytes (14110461 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000153_0
> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 14110457 bytes from map-output for
> > attempt_200908051649_0004_m_000153_0
> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 13510680 bytes (13510684 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000151_0
> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 13510680 bytes from map-output for
> > attempt_200908051649_0004_m_000151_0
> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
> > and0 dup hosts)
> > 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask:
> > Shuffling 14813689 bytes (14813693 raw bytes) into RAM from
> > attempt_200908051649_0004_m_000154_0
> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read
> > 14813689 bytes from map-output for
> > attempt_200908051649_0004_m_000154_0
> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec
> > #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from
> > ITeS-clusterPC2
> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > Closed ram manager
> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > Interleaved on-disk merge complete: 13 files left.
> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
> > In-memory merge complete: 5 files left.
> > 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask:
> > Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk
> > merge
> > 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask:
> > Merging 13 files, 2343086626 bytes from disk
> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask:
> > Merging 0 segments, 0 bytes from memory into reduce
> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging
> > 18 sorted segments
> > 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging
> > 9 intermediate segments out of a total of 18
> > 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to
> > the last merge-pass, with 10 segments left of total size: 2343086558
> > bytes
> > 2009-08-06 18:29:04,667 ERROR
> > org.apache.hadoop.hbase.mapred.TableOutputFormat:
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > trying to locate root region
> >
> >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> >
> > The error seems occured in the sort stage of the reduce task. It never
> > occurs if I write the small data into HBase.
> > Can anyone give me a help? Thank you very much!
> >
> >
> > chu
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message