hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: problem: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
Date Mon, 10 Aug 2009 22:12:13 GMT
chu,

Ok I would suggest that you upgrade to the latest from branch 0.19.
which you can find here
http://svn.apache.org/repos/asf/hadoop/hbase/branches/0.19/ and you
can compile it with "ant package".

HDFS is particularly troublesome with very small clusters like you
have. The exception you got is symptomatic of that but since it is a
client-side exception I can't get the real reason behind your problem.
If you could kindly go in the region servers logs and see if there is
any exception coming from HDFS, that would be great. Another thing
would be to lower the number of map and reduce slots from 4 to 3 or 2
because HBase will eat 1 CPU.

Another solution would be to upgrade to the latest from the 0.20
branch, it is very fast, has loads of fixes and is near release.

J-D

On Mon, Aug 10, 2009 at 2:27 AM, stchu<stchu.cloud@gmail.com> wrote:
> Hi, J-D
>
> My Hadoop version is 0.19.1 and HBase version is 0.19.3.
> The cluster is consist of #1 master and #3 slaves with intel quad-core cpu
> and 4GB RAM (32 bits Ubuntu8.10 server)
> I tried import 2 dataset: 10GB Data (about 300,000,000 rows) and under 1GB
> (30,000,000 rows) Data, but both of them fails.
> The error messages are the same.
> The hbase master log not shows error message but the jobtracker logs some
> errors:
> ----------------------------------------------------------------------------------------------------------
> ...
> 2009-08-06 18:09:30,376 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_200908051649_0004_m_000149_0' has completed
> task_200908051649_0004_m_000149 successfully.
> 2009-08-06 18:09:30,376 INFO org.apache.hadoop.mapred.ResourceEstimator:
> measured blowup on task_200908051649_0004_m_000149 was 365348217/65533900 =
> 5.574950018234837
> 2009-08-06 18:09:30,376 INFO org.apache.hadoop.mapred.ResourceEstimator: new
> estimate is blowup = 5.370451475418986
> 2009-08-06 18:09:42,707 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_200908051649_0004_m_000139_0' has completed
> task_200908051649_0004_m_000139 successfully.
> 2009-08-06 18:09:42,707 INFO org.apache.hadoop.mapred.ResourceEstimator:
> measured blowup on task_200908051649_0004_m_000139 was 350695808/65533900 =
> 5.351364835604168
> 2009-08-06 18:09:42,707 INFO org.apache.hadoop.mapred.ResourceEstimator: new
> estimate is blowup = 5.370335093468896
> 2009-08-06 18:10:08,608 INFO org.apache.hadoop.mapred.JobTracker: Removed
> completed task 'attempt_200908051649_0004_m_000162_0' from
> 'tracker_ITeS-clusterPC2:localhost/127.0.0.1:47481'
> 2009-08-06 18:14:42,073 INFO org.apache.hadoop.mapred.TaskInProgress: Error
> from attempt_200908051649_0004_r_000003_0:
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
> to locate root region
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:115)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:98)
>        at
> org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
>        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> ...
> ------------------------------------------------------------------------------------------------------
>
> Would you please tell me what's the problem and how to resolve it?
> Thanks a lot!
>
> chu
>
>
>
>
> 2009/8/9 Jean-Daniel Cryans <jdcryans@apache.org>
>
>> chu,
>>
>> Which version of HBase is this? How much data are you writing? What's
>> your cluster like (#nodes, hardware, etc)?
>>
>> It would also be useful to see your master log around where it began
>> to fail, you should see exceptions showing up.
>>
>> Thx,
>>
>> J-D
>>
>> On Sat, Aug 8, 2009 at 11:25 PM, stchu<stchu.cloud@gmail.com> wrote:
>> > Hi,
>> >
>> > I tried to write the large amount of data into HBase. The map processes
>> > could be completed, but while processing the reduce tasks, the job is
>> > failed.
>> > The error message is shown as follow:
>> > ------------------------------------------------------------------------
>> > ...
>> > 09/08/07 19:45:47 INFO mapred.JobClient:  map 100% reduce 5%
>> > 09/08/07 19:46:21 INFO mapred.JobClient:  map 100% reduce 6%
>> > 09/08/07 19:46:27 INFO mapred.JobClient:  map 100% reduce 5%
>> > java.io.IOException: Job failed!
>> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>> >    at icl.atc.ites.hbase.PIndexCreator.run(PIndexCreator.java:571)
>> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >    at icl.atc.ites.hbase.PIndexCreator.main(PIndexCreator.java:598)
>> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >    at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >    at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >    at java.lang.reflect.Method.invoke(Method.java:597)
>> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>> >    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> >    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>> >
>> > And the error message on web ui:
>> >
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying to locate root region
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
>> >        at org.apache.hadoop.hbase.client.HTable.(HTable.java:115)
>> >        at org.apache.hadoop.hbase.client.HTable.(HTable.java:98)
>> >        at
>> org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
>> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
>> >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
>> >
>> >
>> > -------------------------------------------------------------------
>> > And I checked both the userlog and the tasktracker log in one of the
>> machine
>> > (these two files are located at the machine #4 in my cluster):
>> > Userlog:
>> >
>> -------------------------------------------------------------------------------------------------
>> > ...
>> >
>> > 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to
>> > the last merge-pass, with 16 segments left of total size: 163329244
>> > bytes
>> > 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 15322761 bytes (15322765 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000155_0
>> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory
>> > complete. Local file is
>> >
>> /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out
>> > of size 163329218
>> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for
>> > merging on-disk files
>> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 15322761 bytes from map-output for
>> > attempt_200908051649_0004_m_000155_0
>> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 14110457 bytes (14110461 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000153_0
>> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 14110457 bytes from map-output for
>> > attempt_200908051649_0004_m_000153_0
>> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 13510680 bytes (13510684 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000151_0
>> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 13510680 bytes from map-output for
>> > attempt_200908051649_0004_m_000151_0
>> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 14813689 bytes (14813693 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000154_0
>> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 14813689 bytes from map-output for
>> > attempt_200908051649_0004_m_000154_0
>> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Closed ram manager
>> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Interleaved on-disk merge complete: 13 files left.
>> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > In-memory merge complete: 5 files left.
>> > 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk
>> > merge
>> > 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Merging 13 files, 2343086626 bytes from disk
>> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Merging 0 segments, 0 bytes from memory into reduce
>> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging
>> > 18 sorted segments
>> > 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging
>> > 9 intermediate segments out of a total of 18
>> > 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to
>> > the last merge-pass, with 10 segments left of total size: 2343086558
>> > bytes
>> > 2009-08-06 18:29:04,667 ERROR
>> > org.apache.hadoop.hbase.mapred.TableOutputFormat:
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying to locate root region
>> > 2009-08-06 18:29:04,699 WARN org.apache.hadoop.mapred.TaskTracker:
>> > Error running child
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying to locate root region
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:786)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:465)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:474)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:447)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:575)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:478)
>> >        at
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:440)
>> >        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:115)
>> >        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:98)
>> >        at
>> org.apache.hadoop.hbase.mapred.TableOutputFormat.getRecordWriter(TableOutputFormat.java:90)
>> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:404)
>> >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
>> >
>> > 2009-08-06 18:29:04,716 INFO org.apache.hadoop.mapred.TaskRunner:
>> > Runnning cleanup for the task
>> >
>> >
>> >
>> ------------------------------------------------------------------------------------------------------------
>> >
>> >  And tasklog:
>> >
>> > 2009-08-06 18:23:25,712 INFO org.apache.hadoop.mapred.Merger: Down to
>> > the last merge-pass, with 16 segments left of total size: 163329244
>> > bytes
>> > 2009-08-06 18:23:26,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:27,531 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 15322761 bytes (15322765 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000155_0
>> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Merge of the 16 files in-memory
>> > complete. Local file is
>> >
>> /home/chu/hadoop19/mapred/local/taskTracker/jobcache/job_200908051649_0004/attempt_200908051649_0004_r_000000_1/output/map_136.out
>> > of size 163329218
>> > 2009-08-06 18:23:31,360 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Thread waiting: Thread for
>> > merging on-disk files
>> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 15322761 bytes from map-output for
>> > attempt_200908051649_0004_m_000155_0
>> > 2009-08-06 18:23:33,203 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000155_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:33,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:34,232 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 14110457 bytes (14110461 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000153_0
>> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 14110457 bytes from map-output for
>> > attempt_200908051649_0004_m_000153_0
>> > 2009-08-06 18:23:42,491 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000153_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:43,866 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:44,050 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 13510680 bytes (13510684 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000151_0
>> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 13510680 bytes from map-output for
>> > attempt_200908051649_0004_m_000151_0
>> > 2009-08-06 18:23:48,791 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000151_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:49,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > attempt_200908051649_0004_r_000000_1 Scheduled 1 outputs (0 slow hosts
>> > and0 dup hosts)
>> > 2009-08-06 18:23:50,194 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Shuffling 14813689 bytes (14813693 raw bytes) into RAM from
>> > attempt_200908051649_0004_m_000154_0
>> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Read
>> > 14813689 bytes from map-output for
>> > attempt_200908051649_0004_m_000154_0
>> > 2009-08-06 18:23:57,301 INFO org.apache.hadoop.mapred.ReduceTask: Rec
>> > #1 from attempt_200908051649_0004_m_000154_0 -> (32, 26) from
>> > ITeS-clusterPC2
>> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Closed ram manager
>> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Interleaved on-disk merge complete: 13 files left.
>> > 2009-08-06 18:23:57,867 INFO org.apache.hadoop.mapred.ReduceTask:
>> > In-memory merge complete: 5 files left.
>> > 2009-08-06 18:23:57,869 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Keeping 5 segments, 73734381 bytes in memory for intermediate, on-disk
>> > merge
>> > 2009-08-06 18:23:57,872 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Merging 13 files, 2343086626 bytes from disk
>> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.ReduceTask:
>> > Merging 0 segments, 0 bytes from memory into reduce
>> > 2009-08-06 18:23:58,671 INFO org.apache.hadoop.mapred.Merger: Merging
>> > 18 sorted segments
>> > 2009-08-06 18:23:59,962 INFO org.apache.hadoop.mapred.Merger: Merging
>> > 9 intermediate segments out of a total of 18
>> > 2009-08-06 18:27:44,177 INFO org.apache.hadoop.mapred.Merger: Down to
>> > the last merge-pass, with 10 segments left of total size: 2343086558
>> > bytes
>> > 2009-08-06 18:29:04,667 ERROR
>> > org.apache.hadoop.hbase.mapred.TableOutputFormat:
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying to locate root region
>> >
>> >
>> -----------------------------------------------------------------------------------------------------------------------------------------------
>> >
>> > The error seems occured in the sort stage of the reduce task. It never
>> > occurs if I write the small data into HBase.
>> > Can anyone give me a help? Thank you very much!
>> >
>> >
>> > chu
>> >
>>
>

Mime
View raw message