hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Starting HBase in fully distributed mode...
Date Fri, 04 Dec 2009 20:53:22 GMT
The first two definitions here is what I'm talking about
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1346

So by default it usually doesn't listen on the interface associated
with the hostname ec2-IP-compute-1.amazonaws.com but on the other one
(IIRC starts with dom-).

J-D

On Fri, Dec 4, 2009 at 12:41 PM, Patrick Hunt <phunt@apache.org> wrote:
> I'm not familiar with ec2, when you say "listen on private hostname" what
> does that mean? Do you mean "by default listen on an interface with a
> non-routable (localonly) ip"? Or something else. Is there an aws page you
> can point me to?
>
> Patrick
>
> Jean-Daniel Cryans wrote:
>>
>> When you saw:
>>
>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
>> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode.
>> The ratio of reported blocks 0.0000 has not reached the threshold 0.9990.
>> *Safe
>> mode will be turned off automatically*.
>>
>> It means that HDFS is blocking everything (aka safe mode) until all
>> datanodes reported for duty (and then it waits for 30 seconds to make
>> sure).
>>
>> When you saw:
>>
>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = *NoNode for /hbase/master*
>>
>> It means that the Master node didn't write his znode in Zookeeper
>> because... when you saw:
>>
>> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception
>> closing session 0x0 to sun.nio.ch.SelectionKeyImpl@10e35d5
>> java.net.ConnectException: Connection refused
>>
>> It really means that the connection was refused. It then says it
>> attempted to connect to ec2-174-129-127-141.compute-1.amazonaws.com
>> but wasn't able to. AFAIK in EC2 the java processes tend to listen on
>> their private hostname not the public one (which would be bad
>> anyways).
>>
>> Bottom line, make sure stuff listens where they are expected and it
>> should then work well.
>>
>> J-D
>>
>> On Fri, Dec 4, 2009 at 11:23 AM, Something Something
>> <mailinglists19@gmail.com> wrote:
>>>
>>> Hadoop: 0.20.1
>>>
>>> HBase: 0.20.2
>>>
>>> Zookeeper: The one which gets started by default by HBase.
>>>
>>>
>>> HBase logs:
>>>
>>> 1)  Master log shows this WARN message, but then it says 'connection
>>> successful'
>>>
>>>
>>> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception
>>> closing session 0x0 to sun.nio.ch.SelectionKeyImpl@10e35d5
>>> java.net.ConnectException: Connection refused
>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>       at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
>>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>> exception during shutdown input
>>> java.nio.channels.ClosedChannelException
>>>       at
>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
>>>       at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>> exception during shutdown output
>>> java.nio.channels.ClosedChannelException
>>>       at
>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
>>>       at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>> 2009-12-04 07:07:37,199 INFO
>>> org.apache.hadoop.hbase.master.RegionManager:
>>> -ROOT- region unset (but not set to be reassigned)
>>> 2009-12-04 07:07:37,200 INFO
>>> org.apache.hadoop.hbase.master.RegionManager:
>>> ROOT inserted into regionsInTransition
>>> 2009-12-04 07:07:37,667 INFO org.apache.zookeeper.ClientCnxn: Attempting
>>> connection to server
>>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181
>>> 2009-12-04 07:07:37,668 INFO org.apache.zookeeper.ClientCnxn: Priming
>>> connection to java.nio.channels.SocketChannel[connected local=/
>>> 10.252.162.19:46195 remote=
>>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181]
>>> 2009-12-04 07:07:37,670 INFO org.apache.zookeeper.ClientCnxn: Server
>>> connection successful
>>>
>>>
>>>
>>> 2)  Regionserver log shows this... but later seems to have recovered:
>>>
>>> 2009-12-04 07:07:36,576 WARN org.apache.zookeeper.ClientCnxn: Exception
>>> closing session 0x0 to sun.nio.ch.SelectionKeyImpl@4ee70b
>>> java.net.ConnectException: Connection refused
>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>       at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933)
>>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>> exception during shutdown input
>>> java.nio.channels.ClosedChannelException
>>>       at
>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
>>>       at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>> exception during shutdown output
>>> java.nio.channels.ClosedChannelException
>>>       at
>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
>>>       at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>> 2009-12-04 07:07:36,742 WARN
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher
>>> on
>>> ZNode /hbase/master
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for /hbase/master
>>>       at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>>       at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780)
>>>       at
>>>
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:304)
>>>       at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:385)
>>>       at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315)
>>>       at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306)
>>>       at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276)
>>>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>       at
>>>
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>       at
>>>
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>>       at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.doMain(HRegionServer.java:2474)
>>>       at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2542)
>>> 2009-12-04 07:07:36,743 WARN
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher
>>> on
>>> ZooKeeper master address. Retrying.
>>>
>>>
>>>
>>> 3)  Zookeepr log:  Nothing much in there... just a starting message
>>> line..
>>> followed by
>>>
>>> ulimit -n 1024
>>>
>>> I looked at archives.  There was one mail that talked about 'ulimit'.
>>>  Wonder if that has something to do with it.
>>>
>>> Thanks for your help.
>>>
>>>
>>>
>>> On Fri, Dec 4, 2009 at 8:18 AM, Mark Vigeant
>>> <mark.vigeant@riskmetrics.com>wrote:
>>>
>>>> When I first started my hbase cluster, it too gave me the nonode for
>>>> /hbase/master several times before it started working, and I believe
>>>> this is
>>>> a common beginner's error (I've seen it in a few emails in the past 2
>>>> weeks).
>>>>
>>>> What versions of HBase, Hadoop and ZooKeeper are you using?
>>>>
>>>> Also, take a look in your HBASE_HOME/logs folder. That would be a good
>>>> place to start looking for some answers.
>>>>
>>>> -Mark
>>>>
>>>> -----Original Message-----
>>>> From: Something Something [mailto:mailinglists19@gmail.com]
>>>> Sent: Friday, December 04, 2009 2:28 AM
>>>> To: hbase-user@hadoop.apache.org
>>>> Subject: Starting HBase in fully distributed mode...
>>>>
>>>> Hello,
>>>>
>>>> I am trying to get Hadoop/HBase up and running in a fully distributed
>>>> mode.
>>>>  For now, I have only *1 Master & 2 Slaves*.
>>>>
>>>> The Hadoop starts correctly.. I think.  The only exception I see in
>>>> various
>>>> log files is this one...
>>>>
>>>>
>>>> org.apache.hadoop.ipc.RemoteException:
>>>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
>>>> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode.
>>>> The ratio of reported blocks 0.0000 has not reached the threshold
>>>> 0.9990.
>>>> *Safe
>>>> mode will be turned off automatically*.
>>>>       at
>>>>
>>>>
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1696)
>>>>       at
>>>>
>>>>
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1676)
>>>>       at
>>>>
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517)
>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>
>>>>
>>>> Somehow this doesn't sound critical, so I assumed everything was good to
>>>> go
>>>> with Hadoop.
>>>>
>>>>
>>>> So then I started HBase and opened a shell (hbase shell).  So far
>>>> everything
>>>> looks good.  Now when I try to run a 'list' command, I keep getting this
>>>> message:
>>>>
>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>>> KeeperErrorCode = *NoNode for /hbase/master*
>>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:328)
>>>>
>>>>
>>>> Here's what I have in my *Master hbase-site.xml*
>>>>
>>>> <configuration>
>>>>  <property>
>>>>   <name>hbase.rootdir</name>
>>>>   <value>hdfs://master:54310/hbase</value>
>>>>  </property>
>>>>  <property>
>>>>   <name>hbase.cluster.distributed</name>
>>>>   <value>true</value>
>>>>  </property>
>>>>  <property>
>>>>   <name>hbase.zookeeper.property.clientPort</name>
>>>>   <value>2181</value>
>>>>  </property>
>>>>  <property>
>>>>   <name>hbase.zookeeper.quorum</name>
>>>>   <value>master,slave1,slave2</value>
>>>>  </property>
>>>> <property>
>>>>
>>>>
>>>>
>>>> The *Slave *hbase-site.xml are set as follows:
>>>>
>>>>  <property>
>>>>   <name>hbase.rootdir</name>
>>>>   <value>hdfs://master:54310/hbase</value>
>>>>  </property>
>>>>  <property>
>>>>   <name>hbase.cluster.distributed</name>
>>>>   <value>false</value>
>>>>  </property>
>>>>  <property>
>>>>   <name>hbase.zookeeper.property.clientPort</name>
>>>>   <value>2181</value>
>>>>  </property>
>>>>
>>>>
>>>> In the hbase-env.sh file on ALL 3 machines I have set the JAVA_HOME and
>>>> set
>>>> the HBase classpath as follows:
>>>>
>>>> export HBASE_CLASSPATH=$HBASE_CLASSPATH:/ebs1/hadoop-0.20.1/conf
>>>>
>>>>
>>>> On *Master* I have added Master & Slaves IP hostnames to *regionservers*
>>>> file.
>>>>  On *slaves*, the regionservers file is empty.
>>>>
>>>>
>>>> I have run hadoop namenode -format multiple times, but still keep
>>>> getting..
>>>> "NoNode for /hbase/master".  What step did I miss?  Thanks for your
>>>> help.
>>>>
>>>> This email message and any attachments are for the sole use of the
>>>> intended
>>>> recipients and may contain proprietary and/or confidential information
>>>> which
>>>> may be privileged or otherwise protected from disclosure. Any
>>>> unauthorized
>>>> review, use, disclosure or distribution is prohibited. If you are not an
>>>> intended recipient, please contact the sender by reply email and destroy
>>>> the
>>>> original message and any copies of the message as well as any
>>>> attachments to
>>>> the original message.
>>>>
>

Mime
View raw message