hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kareem Dana" <kareem.d...@gmail.com>
Subject Re: HBase PerformanceEvaluation failing
Date Fri, 16 Nov 2007 01:31:48 GMT
My DFS appears healthy. After the PE fails, the datanodes are still
running but all the HRegionServers have exited. My initial concern is
free harddrive space or memory. Each node has ~1.5GB free space for
DFS and 400MB ram/256mb swap. Is this enough for the PE? I tried
monitoring the free space as the PE ran and it never completely filled
up but it is kind of tight.


On Nov 15, 2007 8:01 PM, stack <stack@duboce.net> wrote:
> Your DFS is healthy?  This seems odd: "File
> /tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/mapfiles/6464987859396543981/datacould
> only be replicated to 0 nodes, instead of 1;"  In my experience, IIRC,
> it means no datanodes running.
>
> (I just tried the PE from TRUNK and it ran to completion).
>
> St.Ack
>
>
> Kareem Dana wrote:
> > I'm trying to run the HBase PerformanceEvaluation program on a cluster
> > of 5 hadoop nodes (on virtual machines).
> >
> > hadoop07 is a DFS Master and HBase master
> > hadoop08-12 are HBase region servers
> >
> > I start the test as follows:
> >
> > $ bin/hadoop jar
> > ${HADOOP_HOME}build/contrib/hbase/hadoop-0.15.0-dev-hbase-test.jar
> > sequentialWrite 2
> >
> > This starts the sequentialWrite test with 2 clients. After about 25
> > minutes the map tasks are about 25% complete and reduce at 6% the test
> > fails with the following error:
> > 2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.TaskInProgress:
> > TaskInProgress tip_200711151626_0001_m_000002 has failed 1 times.
> > 2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.JobInProgress:
> > Aborting job job_200711151626_0001
> > 2007-11-15 17:06:35,101 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from task_200711151626_0001_m_000006_0:
> > org.apache.hadoop.hbase.NoServerForRegionException: failed to find
> > server for TestTable after 5 retries
> >       at org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:761)
> >       at org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:521)
> >       at org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:317)
> >       at org.apache.hadoop.hbase.HTable.commit(HTable.java:671)
> >       at org.apache.hadoop.hbase.HTable.commit(HTable.java:636)
> >       at org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:493)
> >       at org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:356)
> >       at org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:529)
> >       at org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:184)
> >       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
> >       at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> >
> >
> > An HBase region server log shows these errors:
> > 2007-11-15 17:03:00,017 ERROR org.apache.hadoop.hbase.HRegionServer:
> > error closing region TestTable,2102165,6843477525281170954
> > org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
> > File /tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/mapfiles/6464987859396543981/data
> > could only be replicated to 0 nodes, instead of 1
> >         at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
> >         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> >         at org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
> >         at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
> >         at org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978)
> >         at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
> >         at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,615 ERROR org.apache.hadoop.hbase.HRegionServer:
> > error closing region TestTable,3147654,8929124532081908894
> > org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
> > File /tmp/hadoop-kcd/hbase/hregion_TestTable,3147654,8929124532081908894/info/mapfiles/3451857497397493742/data
> > could only be replicated to 0 nodes, instead of 1
> >         at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
> >         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> >         at org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
> >         at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
> >         at org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978)
> >         at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
> >         at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,639 ERROR org.apache.hadoop.hbase.HRegionServer:
> > Close and delete failed
> > java.io.IOException: java.io.IOException: File
> > /tmp/hadoop-kcd/hbase/log_172.16.6.57_-3889232888673408171_60020/hlog.dat.005
> > could only be replicated to 0 nodes, instead of 1
> >         at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003)
> >         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> >         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> >         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> >         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> >         at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
> >         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
> >         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)
> >         at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:597)
> >         at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,640 INFO org.apache.hadoop.hbase.HRegionServer:
> > telling master that region server is shutting down at:
> > 172.16.6.57:60020
> > 2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
> > stopping server at: 172.16.6.57:60020
> > 2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
> > regionserver/0.0.0.0:60020 exiting
> >
> > I can provide some more logs if necessary. Any ideas or suggestions
> > about how I track this down? Running sequentialWrite test with just 1
> > client works fine but using 2 or more causes these errors.
> >
> > Thanks for any help,
> > Kareem Dana
> >
>
>

Mime
View raw message