hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Re: While starting 3-nodes cluster hbase: WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null
Date Tue, 30 Apr 2013 16:38:32 GMT
Hi John,

Thanks for sharing that. Might help other people who are facing the same
issues.

JM

2013/4/30 John Foxinhead <john.foxinhead@gmail.com>

> Now I post my configurations:
> I use a 3 nodes cluster with all the nodes runnind hadoop, zookeeper and
> hbase. Hbase master, a zookeeper daemon and Hadoop namenode run on the same
> host. Hbase regionserver, a zookeeper daemon and hadoop datanode run on the
> other 2 nodes. I called one of the datanodes "jobtracker" because of the
> various configuration i tried, but it is a datanode like "datanode1"
> because i configured also the jobtracker while installing hadoop but i
> never used it as jobtracker, but as datanode, because hbase doesn't need
> the use of Map-Reduce algorithm.
> I run all on the same pc: the 3 nodes are 3 virtual machines running on
> VirtualBox connected throught internal network or bridged adapter network
> interfaces (theese are configurations of virtualbox)
> It's important to know that i use 3 virtual machines because comunication
> is very very slow, expecially at startup of hadoop, zookeeper and hbase.
>
>
> HADOOP:
>
> hadoop-env.sh:
> export JAVA_HOME=/usr/lib/jvm/java-7-oracle
> export HADOOP_CLASSPATH=/home/debian/hadoop-1.0.4/lib
> export HADOOP_HEAPSIZE=1000
>
> core-site.xml:
> <configuration>
>   <property>
>     <name>fs.default.name</name>
>     <value>hdfs://namenode:9000/</value>
>   </property>
> </configuration>
>
> hdfs-site.xml:
> <configuration>
>   <property>
>     <name>dfs.name.dir</name>
>     <value>/home/debian/hadoop-1.0.4/FILESYSTEM/name</value>
>   </property>
>   <property>
>     <name>dfs.data.dir</name>
>     <value>/home/debian/hadoop-1.0.4/FILESYSTEM/data</value>
>   </property>
>   <property>
>     <name>dfs.support.append</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>dfs.datanode.max.xcievers</name>
>     <value>4096</value>
>   </property>
> </configuration>
>
> masters:
> slaves:
> jobtracker
> datanode1
>
>
> HBASE:
>
> hbase-env.sh
> export JAVA_HOME=/usr/lib/jvm/java-7-oracle
> export HBASE_CLASSPATH=/home/debian/hbase-0.94.5/lib
> export HBASE_MANAGES_ZK=false
>
> hbase-site.xml:
> <configuration>
>     <property>
>         <name>dfs.support.append</name>
>         <value>true</value>
>     </property>
>     <property>
>         <name>hbase.rootdir</name>
>         <value>hdfs://namenode:9000/hbase</value>
>     </property>
>     <property>
>         <name>hbase.cluster.distributed</name>
>         <value>true</value>
>     </property>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>namenode,jobtracker,datanode1</value>
>     </property>
>     <property>
>         <name>hbase.zookeeper.property.dataDir</name>
>         <value>/home/debian/hbase-0.94.5/zookeeper/data</value>
>     </property>
>     <property>
>         <name>hbase.master</name>
>         <value>namenode:60000</value>
>     </property>
> </configuration>
> note: i think the property hbase.master doesn't work from years, so it can
> be deleted, but after a lot of tries my hbase worked, so i left it there.
> I'll try to delete it later.
> regionservers:
> jobtracker
> datanode
>
>
> OS FILES:
>
> /etc/hosts:
>
> 127.0.0.1    localhost
> 127.0.0.1    debian01
> #  HADOOP
> 192.168.1.111    jobtracker
> 192.168.1.112    datanode1
> 192.168.1.121    namenode
> # The following lines are desirable for IPv6 capable hosts
> ::1     ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
>
> /etc/hostname:
> namenode (or jobtracker, or datanode1, depending on the node)
>
> /etc/network/interfaces (to set static IPs: on namenode: address
> 192.168.1.121, on jobtracker: address 192.168.1.111, on datanode1: address
> 192.168.1.112):
> iface eth6 inet static
> address 192.168.1.121
> netmask 255.255.255.0
> network 192.168.1.0
> broadcast 192.168.1.255
> gateway 192.168.1.254
> dns-nameserver 8.8.8.8 8.8.4.4
> note: eth6 because eth2 (where i had "bridged network adapter" virtual
> interface) was remapped on eth6 (you can verifying it with "$ dmesg | grep
> eth"), so replace eth6 with your interface.
>
>
> MY PROBLEMS (think i copied hbase and hadoop directories from the working
> pseudo-distribuited version directory, so pseudo-distribuited version
> works):
>
> 1) After starting up hadoop, trying some shell comands to put a file in
> hadoop filesystem and later get the same file from HFDS, i get the file,
> but the file is empty.
> SOLUTION: FILESYSTEM, FILESYSTEM/data and FILESYSTEM/name directory must
> have 755 (rwxr-xr-x) permission.
> 2) After starting up hadoop, trying some shell comands to put a file in
> hadoop filesystem and later get the same file from HFDS, i receive
> warnings/error (on log files) related to the mismatching beetween the
> expected and the received ID of the blocks.
> EXPLAINING: It could appen if, after the use of an HDFS, for examples
> putting files into it, i use "bin/hadoop namenode -format" to formatting a
> new HDFS, and i have changed the hdfs.data.dir and hdfs.name.dir to a
> persistent location (default is a tmp location, which is cleared on restart
> of the OS). "bin/hadoop namenode -format" formats the hdfs.name.dir
> directory and get a new ID for HDFS blocks. It doesn't format hdfs.data.dir
> directory on datanodes, so datanodes expect the old blocks' ID and there is
> a mismatching.
> SOLUTION: clear all hdfs.data.dir directories on all datanodes, then
> reformat a new filesystem using bin/hadoop namenode -format on namenode.
>
> 3)Hbase, while starting managing zookeeper, can't connect to zookeeper (1):
> SOLUTION: set HBASE_MANAGES_ZK=false, so that hbase can't manage zookeeper.
> It's recommended in my case because i launch 3 virtual machines, so hbase
> fails connecting to zookeeper because it reaches the fail limit before
> zookeeper cluster starts up completely. So i run zookeeper on all of the 3
> nodes with "$ /bin/hbase-daemon.sh start zookeeper" and i wait some
> minutes. This is because of the slow connection between 3 Virtual Machines.
> Then i try zookeeper cluster with some "ls /" from zk shell (launch it with
> "$ bin/hbase zkcli") and i ensure the shell connects with the right node on
> the right port while launching zk shell (that is a zk client)
>
> 4)Hbase, without managing zookeeper, can't connect to zookeeper. All
> configurations where right, as written above, but hbase launched a 1-node
> zookeeper cluster on master at localhost and connect to it, Also master
> doesn't start regionserver. It's a strange problem.
> SOLUTION: This solution is as strange as the problem. Configuration files
> were right, bit hbase didn't work, so i opened a regionserver
> virtualmachine, i completely removed hbase directories, i copied
> hbase-pseudo-distribuited directory and i renamed it as the previous one. I
> manually copied all the configuration files from the hbase directory on the
> master. Then I closed all the virtual machines, i made a backup of the old
> master, and i deleted all the vms, except the backup and the slave vm in
> which i re-copied configuration files. I made other 2 clones of this
> virtual machine (with the new hbase folder) and i modified only
> /etc/network/interfaces, so I set the proper IP for each of the VMs. Then
> hbase was able to connect to the zookeeper cluster and hbase was able to
> start regionserver. I think it was because of some rubbish left during the
> lots of tries i made on master node, so copying conf files in a slave node
> and making it became the new master solved my problem. Then i made another
> backup, to clean the system from future rubbish and solve problems like
> this.
>
> 5)Hbase connects to the running zookeeper cluster but there is the last
> problem: master launches regionservers, but on the regionservers' nodes,
> when the regionserver daemon starts, it try to connect to master at
> localhost:60000, instead of connecting at namenode:60000.
> SOLUTION: The property hbase.master in unuseful because it's not supported
> for years. So that the problem is the file /etc/hostname. It's contento is
> "debian01" to all the nodes, but it could be "namenode" on the namenode,
> "datanode" on the datanode and "jobtracker" on the jobtracker (The hostname
> used in hbase conf files referring to each node). This was my last changing
> in configurations. When i changed also this, finally hbase worked properly.
> Note: just relogging will not make effective the changes in /etc/hostname,
> in fact when you relog you'll see, for example, something like
> "debian@debian01", even if you already changed "debian01" with "namenode".
> You need to completely shut down the OS and restart it to make the changes
> work.
>
>
> Now Hadoop, Zookeeper and Hbase work, and also some jar compiled to test
> some simple instructions like Put and Get not from hbase shell, but from
> Hbase Java API work.
> Thank you all, and I hope someone else cold take advantage from my issues.
>
>
>
>
>
>
>
> 2013/4/30 John Foxinhead <john.foxinhead@gmail.com>
>
> > I solved the last problem:
> > I modified the file /etc/hostname and i replaced the default hostname,
> > "debian01" with "namenode", "jobtracker", or " datanode", the hostnames i
> > used in hbase conf files. Now i start hbase fro master with
> > "bin/start-hbase.sh" and regionservers, instead of trying to connect with
> > master at localhost:60000, connect with namenode:60000.
> > Now all is working good. Thank you all. Later I will post my
> configuration
> > files and make a summary of the problems I encountered, so that other
> users
> > can take advantage from those.
> >
> >
> > 2013/4/30 John Foxinhead <john.foxinhead@gmail.com>
> >
> >> I solved my problem with zookeeper. I don't know how, maybe it was a
> >> spell xD
> >> I made this way: on a slave i removed the directory of hbase, and i
> >> copied the diectory of hbase-pseudo-distribuited (which works). Then i
> >> copied all the configurations from the virtual machines which runned as
> >> master in the new directory, making it distribuited. Then i cloned the
> >> virtual machine 2 times, i made some configuration in and in
> >> /etc/network/interfaces file to set the proper IP on the VMs, and then
> >> zookeeper magically worked. All the configuration were the same. Maybe i
> >> made some wrong configuration in some OS file, or there was some rubbish
> >> left by the hundreds of tries i made on the master. Then, changing the
> VMs
> >> working as master solved my problem.
> >> Now:
> >> - I start HDFS with "$ ~/hadoop-1.0.4/bin/start-dfs.sh"
> >> - i try some command from hadoop shell to ensure it works (I found out
> >> that the directory on local fs that datanodes and namenode use as
> >> storage-space for HDFS' files' blocks need to has permission 755,
> >> otherwise, even if permission are larger, when you put a file in HDFS,
> the
> >> file in HDFS is created, bit it's content isn't tranferred so when you
> get
> >> the file you find out that the file is empty)
> >> - i start zookeeper on my 3 VMs with "$
> >> ~/hbase-0.94.5/bin/hbase-daemon.sh start zookeeper" and i wait 2-3
> minutes
> >> to be sure zookeeper completely started. Then i check in logs for some
> >> errors or warning, and i use "$ ~/hbase-0.94.5/bin/hbase zkcli" with
> some
> >> "ls" to ensure the client connect on zookeeper on the right node and
> port
> >> (2181). Related to zookeeper i found out that with HBASE_MANAGE_ZK=true
> in
> >> hbase-env.sh file, there was an error because zookeeper does't have
> time to
> >> set up properly before hbase master is launched. So, with a lot of VMs
> (i
> >> use 3, and they are a lot) it's better set HBASE_MANAGE_ZK=false and
> start
> >> it manually on the nodes so that you can wait until zookeeper is set up,
> >> before launch master.
> >> - All works properly until now so i start hbase with "$
> >> ~/hbase-0.94.5/bin/start-hbase.sh. Now the output shows that master
> launch
> >> also regionservers on regionservers' nodes (good, because before it
> showed
> >> only that the master was launched on localhost, but nothing about
> >> regionserver). When i see the logs file on both master and
> regionserver's
> >> logs directory it shows that hbase daemons connect  properly on the
> >> zookeeper cluster reported in zookeeper.property.quorum (or something
> >> similar) property in hbase-site.xml and the port also is right (2181,
> the
> >> same used by the tool zkcli).
> >>
> >> Now the problem is that master starts on localhost:60000, not at
> >> namenode:60000, so on master node it's ok, but when regionserver try to
> >> connect to master at localhost:60000 they don't find (naturally)
> nothing at
> >> it's launched MasterNotRunningException, so that regionserver, after
> >> connecting to zookeeper, crash because of that.
> >> I found out in logs file on regionserver that they connect to zookeeper
> >> cluster and then they crash because they don't find a running master on
> >> localhost:60000, so it's right. But the strange thing is that in conf
> files
> >> i never used "localhost". I also tried so set the property hbase.master
> at
> >> namenode:60000, but this property isn't used from years, so it doesn't
> work
> >> anymore. What can i do?
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message