hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 闫雪冰 <yanxueb...@alibaba-inc.com>
Subject RE: HBase PerformanceEvaluation failing
Date Mon, 19 Nov 2007 13:00:32 GMT
Hi dhruba,

What pointed us to SecureRandom was the stack trace in log file for
DataNode, after the 'hadoop dfs -put' failed, my colleague York stepped in
from there and found the problem out. 

I've written a little program on my box (running FreeBSD4.11-YAHOO-20070423,
you might be familiar with , :P) to test the call to
SecureRandom.getInstance("SHA1PRNG").nextInt, at which the program always
hung up, and there was no any exception thrown.

Xuebing

-----邮件原件-----
发件人: dhruba Borthakur [mailto:dhruba@yahoo-inc.com] 
发送时间: 2007年11月16日 14:05
收件人: hadoop-user@lucene.apache.org
主题: RE: HBase PerformanceEvaluation failing

Hello Xuebing Yan,

Your finding is interesting. The DataNode uses SecureRandom to generate a
random number. If the call throws a NoSuchAlgorithmException then it is
supposed to fall back to using a Random object. In your test case, did you
see this exception being generated? If not, can you pl describe the
behaviour you saw that pointed you to the reasoning that SecureRandom was
causing the problem?

Thanks a lot,
dhruba

-----Original Message-----
From: 闫雪冰 [mailto:yaxuebing@alibaba-inc.com] 
Sent: Thursday, November 15, 2007 6:09 PM
To: hadoop-user@lucene.apache.org
Subject: 答复: HBase PerformanceEvaluation failing

Are you working on FreeBSD 4.11? Did you ever succeed in doing a 'dfs -put'
operation? 

I went into a very similar trouble a few days ago. In my case, I got an
"only be replicated to 0 nodes, instead of 1" msg when I tried to run the PE
program, I found that I couldn't even managed to make a 'dfs -put' which
would also give me the previous error msg, though I succeeded in doing 'dfs
-makedir'.

The reason is SecureRandom doesn't work on my FreeBSD 4.11, I finally get
two solutions:
	a) Get back to hadoop-0.14.3, which will work fine with the same
configuration, or
	b) Comment off the SecureRandom block like below
----------------------------------------------------------
 /*
    try {
      rand =
SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
    } catch (NoSuchAlgorithmException e) {
      LOG.warn("Could not use SecureRandom");
      rand = (new Random()).nextInt(Integer.MAX_VALUE);
    }
*/
    rand = (new Random()).nextInt(Integer.MAX_VALUE);
----------------------------------------------------------
May it help.
-Xuebing Yan

-----邮件原件-----
发件人: Kareem Dana [mailto:kareem.dana@gmail.com] 
发送时间: 2007年11月16日 9:32
收件人: hadoop-user@lucene.apache.org
主题: Re: HBase PerformanceEvaluation failing

My DFS appears healthy. After the PE fails, the datanodes are still
running but all the HRegionServers have exited. My initial concern is
free harddrive space or memory. Each node has ~1.5GB free space for
DFS and 400MB ram/256mb swap. Is this enough for the PE? I tried
monitoring the free space as the PE ran and it never completely filled
up but it is kind of tight.


On Nov 15, 2007 8:01 PM, stack <stack@duboce.net> wrote:
> Your DFS is healthy?  This seems odd: "File
>
/tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/map
files/6464987859396543981/datacould
> only be replicated to 0 nodes, instead of 1;"  In my experience, IIRC,
> it means no datanodes running.
>
> (I just tried the PE from TRUNK and it ran to completion).
>
> St.Ack
>
>
> Kareem Dana wrote:
> > I'm trying to run the HBase PerformanceEvaluation program on a cluster
> > of 5 hadoop nodes (on virtual machines).
> >
> > hadoop07 is a DFS Master and HBase master
> > hadoop08-12 are HBase region servers
> >
> > I start the test as follows:
> >
> > $ bin/hadoop jar
> > ${HADOOP_HOME}build/contrib/hbase/hadoop-0.15.0-dev-hbase-test.jar
> > sequentialWrite 2
> >
> > This starts the sequentialWrite test with 2 clients. After about 25
> > minutes the map tasks are about 25% complete and reduce at 6% the test
> > fails with the following error:
> > 2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.TaskInProgress:
> > TaskInProgress tip_200711151626_0001_m_000002 has failed 1 times.
> > 2007-11-15 17:06:35,100 INFO org.apache.hadoop.mapred.JobInProgress:
> > Aborting job job_200711151626_0001
> > 2007-11-15 17:06:35,101 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from task_200711151626_0001_m_000006_0:
> > org.apache.hadoop.hbase.NoServerForRegionException: failed to find
> > server for TestTable after 5 retries
> >       at
org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HC
onnectionManager.java:761)
> >       at
org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(
HConnectionManager.java:521)
> >       at
org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(H
ConnectionManager.java:317)
> >       at org.apache.hadoop.hbase.HTable.commit(HTable.java:671)
> >       at org.apache.hadoop.hbase.HTable.commit(HTable.java:636)
> >       at
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(Pe
rformanceEvaluation.java:493)
> >       at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluatio
n.java:356)
> >       at
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvalua
tion.java:529)
> >       at
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(Performa
nceEvaluation.java:184)
> >       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
> >       at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> >
> >
> > An HBase region server log shows these errors:
> > 2007-11-15 17:03:00,017 ERROR org.apache.hadoop.hbase.HRegionServer:
> > error closing region TestTable,2102165,6843477525281170954
> > org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
> > File
/tmp/hadoop-kcd/hbase/hregion_TestTable,2102165,6843477525281170954/info/map
files/6464987859396543981/data
> > could only be replicated to 0 nodes, instead of 1
> >         at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003
)
> >         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> >         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> >         at
org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
> >         at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
> >         at
org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978
)
> >         at
org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
> >         at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,615 ERROR org.apache.hadoop.hbase.HRegionServer:
> > error closing region TestTable,3147654,8929124532081908894
> > org.apache.hadoop.hbase.DroppedSnapshotException: java.io.IOException:
> > File
/tmp/hadoop-kcd/hbase/hregion_TestTable,3147654,8929124532081908894/info/map
files/3451857497397493742/data
> > could only be replicated to 0 nodes, instead of 1
> >         at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003
)
> >         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> >         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> >         at
org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:886)
> >         at org.apache.hadoop.hbase.HRegion.close(HRegion.java:388)
> >         at
org.apache.hadoop.hbase.HRegionServer.closeAllRegions(HRegionServer.java:978
)
> >         at
org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:593)
> >         at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,639 ERROR org.apache.hadoop.hbase.HRegionServer:
> > Close and delete failed
> > java.io.IOException: java.io.IOException: File
> >
/tmp/hadoop-kcd/hbase/log_172.16.6.57_-3889232888673408171_60020/hlog.dat.00
5
> > could only be replicated to 0 nodes, instead of 1
> >         at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003
)
> >         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293)
> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> >         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> >         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
> >         at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:39)
> >         at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:27)
> >         at
java.lang.reflect.Constructor.newInstance(Constructor.java:494)
> >         at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteE
xceptionHandler.java:82)
> >         at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExcept
ionHandler.java:48)
> >         at
org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:597)
> >         at java.lang.Thread.run(Thread.java:595)
> > 2007-11-15 17:03:00,640 INFO org.apache.hadoop.hbase.HRegionServer:
> > telling master that region server is shutting down at:
> > 172.16.6.57:60020
> > 2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
> > stopping server at: 172.16.6.57:60020
> > 2007-11-15 17:03:00,643 INFO org.apache.hadoop.hbase.HRegionServer:
> > regionserver/0.0.0.0:60020 exiting
> >
> > I can provide some more logs if necessary. Any ideas or suggestions
> > about how I track this down? Running sequentialWrite test with just 1
> > client works fine but using 2 or more causes these errors.
> >
> > Thanks for any help,
> > Kareem Dana
> >
>
>

Mime
View raw message