Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7855B104BD for ; Tue, 30 Apr 2013 16:27:05 +0000 (UTC) Received: (qmail 33577 invoked by uid 500); 30 Apr 2013 16:27:03 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 33403 invoked by uid 500); 30 Apr 2013 16:27:03 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Delivered-To: moderator for user@hbase.apache.org Received: (qmail 71998 invoked by uid 99); 30 Apr 2013 15:48:06 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john.foxinhead@gmail.com designates 209.85.210.65 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=9WMs9uyqzCWzacxJA3lrZIZgnuybLBMobqhmV3O3Uf4=; b=UXSoRjURdBFYTzFyGTaY6a1t/0O1FerHshOkgO+4xfQwimUFK6r3+35G3TO2MhEqRv P1c1hovQxtQ4PQYuQSZyU0ZPSWCpSL5KMm04wYzrFhbPiZkTi1OhtKNbsEIHBjfC8+i/ iPhpPRy1eUZfq9DO4Lmt5vU1ctfPyBS+yrwPJRenTG2+TyDUUDz6Gf6uyb5GPPRJNtdB N96C6WiM5d4d0QiayzFv4Zh+0rmI1Xh2URQz63W9HiDTf6bsc0nesbCPABLZLzX1JuaX zxZqyRCljVUwiQgdXafEKcba0+1aJdfBmWpbD6GjSGigEsi2dhdaigFBPzXRq5glb0bN CTSQ== MIME-Version: 1.0 X-Received: by 10.66.51.102 with SMTP id j6mr40855002pao.210.1367336858304; Tue, 30 Apr 2013 08:47:38 -0700 (PDT) In-Reply-To: References: Date: Tue, 30 Apr 2013 17:47:38 +0200 Message-ID: Subject: Re: Re: While starting 3-nodes cluster hbase: WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null From: John Foxinhead To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=bcaec544f02eba46d004db95eb6f X-Virus-Checked: Checked by ClamAV on apache.org --bcaec544f02eba46d004db95eb6f Content-Type: text/plain; charset=ISO-8859-1 Now I post my configurations: I use a 3 nodes cluster with all the nodes runnind hadoop, zookeeper and hbase. Hbase master, a zookeeper daemon and Hadoop namenode run on the same host. Hbase regionserver, a zookeeper daemon and hadoop datanode run on the other 2 nodes. I called one of the datanodes "jobtracker" because of the various configuration i tried, but it is a datanode like "datanode1" because i configured also the jobtracker while installing hadoop but i never used it as jobtracker, but as datanode, because hbase doesn't need the use of Map-Reduce algorithm. I run all on the same pc: the 3 nodes are 3 virtual machines running on VirtualBox connected throught internal network or bridged adapter network interfaces (theese are configurations of virtualbox) It's important to know that i use 3 virtual machines because comunication is very very slow, expecially at startup of hadoop, zookeeper and hbase. HADOOP: hadoop-env.sh: export JAVA_HOME=/usr/lib/jvm/java-7-oracle export HADOOP_CLASSPATH=/home/debian/hadoop-1.0.4/lib export HADOOP_HEAPSIZE=1000 core-site.xml: fs.default.name hdfs://namenode:9000/ hdfs-site.xml: dfs.name.dir /home/debian/hadoop-1.0.4/FILESYSTEM/name dfs.data.dir /home/debian/hadoop-1.0.4/FILESYSTEM/data dfs.support.append true dfs.datanode.max.xcievers 4096 masters: slaves: jobtracker datanode1 HBASE: hbase-env.sh export JAVA_HOME=/usr/lib/jvm/java-7-oracle export HBASE_CLASSPATH=/home/debian/hbase-0.94.5/lib export HBASE_MANAGES_ZK=false hbase-site.xml: dfs.support.append true hbase.rootdir hdfs://namenode:9000/hbase hbase.cluster.distributed true hbase.zookeeper.quorum namenode,jobtracker,datanode1 hbase.zookeeper.property.dataDir /home/debian/hbase-0.94.5/zookeeper/data hbase.master namenode:60000 note: i think the property hbase.master doesn't work from years, so it can be deleted, but after a lot of tries my hbase worked, so i left it there. I'll try to delete it later. regionservers: jobtracker datanode OS FILES: /etc/hosts: 127.0.0.1 localhost 127.0.0.1 debian01 # HADOOP 192.168.1.111 jobtracker 192.168.1.112 datanode1 192.168.1.121 namenode # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters /etc/hostname: namenode (or jobtracker, or datanode1, depending on the node) /etc/network/interfaces (to set static IPs: on namenode: address 192.168.1.121, on jobtracker: address 192.168.1.111, on datanode1: address 192.168.1.112): iface eth6 inet static address 192.168.1.121 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 gateway 192.168.1.254 dns-nameserver 8.8.8.8 8.8.4.4 note: eth6 because eth2 (where i had "bridged network adapter" virtual interface) was remapped on eth6 (you can verifying it with "$ dmesg | grep eth"), so replace eth6 with your interface. MY PROBLEMS (think i copied hbase and hadoop directories from the working pseudo-distribuited version directory, so pseudo-distribuited version works): 1) After starting up hadoop, trying some shell comands to put a file in hadoop filesystem and later get the same file from HFDS, i get the file, but the file is empty. SOLUTION: FILESYSTEM, FILESYSTEM/data and FILESYSTEM/name directory must have 755 (rwxr-xr-x) permission. 2) After starting up hadoop, trying some shell comands to put a file in hadoop filesystem and later get the same file from HFDS, i receive warnings/error (on log files) related to the mismatching beetween the expected and the received ID of the blocks. EXPLAINING: It could appen if, after the use of an HDFS, for examples putting files into it, i use "bin/hadoop namenode -format" to formatting a new HDFS, and i have changed the hdfs.data.dir and hdfs.name.dir to a persistent location (default is a tmp location, which is cleared on restart of the OS). "bin/hadoop namenode -format" formats the hdfs.name.dir directory and get a new ID for HDFS blocks. It doesn't format hdfs.data.dir directory on datanodes, so datanodes expect the old blocks' ID and there is a mismatching. SOLUTION: clear all hdfs.data.dir directories on all datanodes, then reformat a new filesystem using bin/hadoop namenode -format on namenode. 3)Hbase, while starting managing zookeeper, can't connect to zookeeper (1): SOLUTION: set HBASE_MANAGES_ZK=false, so that hbase can't manage zookeeper. It's recommended in my case because i launch 3 virtual machines, so hbase fails connecting to zookeeper because it reaches the fail limit before zookeeper cluster starts up completely. So i run zookeeper on all of the 3 nodes with "$ /bin/hbase-daemon.sh start zookeeper" and i wait some minutes. This is because of the slow connection between 3 Virtual Machines. Then i try zookeeper cluster with some "ls /" from zk shell (launch it with "$ bin/hbase zkcli") and i ensure the shell connects with the right node on the right port while launching zk shell (that is a zk client) 4)Hbase, without managing zookeeper, can't connect to zookeeper. All configurations where right, as written above, but hbase launched a 1-node zookeeper cluster on master at localhost and connect to it, Also master doesn't start regionserver. It's a strange problem. SOLUTION: This solution is as strange as the problem. Configuration files were right, bit hbase didn't work, so i opened a regionserver virtualmachine, i completely removed hbase directories, i copied hbase-pseudo-distribuited directory and i renamed it as the previous one. I manually copied all the configuration files from the hbase directory on the master. Then I closed all the virtual machines, i made a backup of the old master, and i deleted all the vms, except the backup and the slave vm in which i re-copied configuration files. I made other 2 clones of this virtual machine (with the new hbase folder) and i modified only /etc/network/interfaces, so I set the proper IP for each of the VMs. Then hbase was able to connect to the zookeeper cluster and hbase was able to start regionserver. I think it was because of some rubbish left during the lots of tries i made on master node, so copying conf files in a slave node and making it became the new master solved my problem. Then i made another backup, to clean the system from future rubbish and solve problems like this. 5)Hbase connects to the running zookeeper cluster but there is the last problem: master launches regionservers, but on the regionservers' nodes, when the regionserver daemon starts, it try to connect to master at localhost:60000, instead of connecting at namenode:60000. SOLUTION: The property hbase.master in unuseful because it's not supported for years. So that the problem is the file /etc/hostname. It's contento is "debian01" to all the nodes, but it could be "namenode" on the namenode, "datanode" on the datanode and "jobtracker" on the jobtracker (The hostname used in hbase conf files referring to each node). This was my last changing in configurations. When i changed also this, finally hbase worked properly. Note: just relogging will not make effective the changes in /etc/hostname, in fact when you relog you'll see, for example, something like "debian@debian01", even if you already changed "debian01" with "namenode". You need to completely shut down the OS and restart it to make the changes work. Now Hadoop, Zookeeper and Hbase work, and also some jar compiled to test some simple instructions like Put and Get not from hbase shell, but from Hbase Java API work. Thank you all, and I hope someone else cold take advantage from my issues. 2013/4/30 John Foxinhead > I solved the last problem: > I modified the file /etc/hostname and i replaced the default hostname, > "debian01" with "namenode", "jobtracker", or " datanode", the hostnames i > used in hbase conf files. Now i start hbase fro master with > "bin/start-hbase.sh" and regionservers, instead of trying to connect with > master at localhost:60000, connect with namenode:60000. > Now all is working good. Thank you all. Later I will post my configuration > files and make a summary of the problems I encountered, so that other users > can take advantage from those. > > > 2013/4/30 John Foxinhead > >> I solved my problem with zookeeper. I don't know how, maybe it was a >> spell xD >> I made this way: on a slave i removed the directory of hbase, and i >> copied the diectory of hbase-pseudo-distribuited (which works). Then i >> copied all the configurations from the virtual machines which runned as >> master in the new directory, making it distribuited. Then i cloned the >> virtual machine 2 times, i made some configuration in and in >> /etc/network/interfaces file to set the proper IP on the VMs, and then >> zookeeper magically worked. All the configuration were the same. Maybe i >> made some wrong configuration in some OS file, or there was some rubbish >> left by the hundreds of tries i made on the master. Then, changing the VMs >> working as master solved my problem. >> Now: >> - I start HDFS with "$ ~/hadoop-1.0.4/bin/start-dfs.sh" >> - i try some command from hadoop shell to ensure it works (I found out >> that the directory on local fs that datanodes and namenode use as >> storage-space for HDFS' files' blocks need to has permission 755, >> otherwise, even if permission are larger, when you put a file in HDFS, the >> file in HDFS is created, bit it's content isn't tranferred so when you get >> the file you find out that the file is empty) >> - i start zookeeper on my 3 VMs with "$ >> ~/hbase-0.94.5/bin/hbase-daemon.sh start zookeeper" and i wait 2-3 minutes >> to be sure zookeeper completely started. Then i check in logs for some >> errors or warning, and i use "$ ~/hbase-0.94.5/bin/hbase zkcli" with some >> "ls" to ensure the client connect on zookeeper on the right node and port >> (2181). Related to zookeeper i found out that with HBASE_MANAGE_ZK=true in >> hbase-env.sh file, there was an error because zookeeper does't have time to >> set up properly before hbase master is launched. So, with a lot of VMs (i >> use 3, and they are a lot) it's better set HBASE_MANAGE_ZK=false and start >> it manually on the nodes so that you can wait until zookeeper is set up, >> before launch master. >> - All works properly until now so i start hbase with "$ >> ~/hbase-0.94.5/bin/start-hbase.sh. Now the output shows that master launch >> also regionservers on regionservers' nodes (good, because before it showed >> only that the master was launched on localhost, but nothing about >> regionserver). When i see the logs file on both master and regionserver's >> logs directory it shows that hbase daemons connect properly on the >> zookeeper cluster reported in zookeeper.property.quorum (or something >> similar) property in hbase-site.xml and the port also is right (2181, the >> same used by the tool zkcli). >> >> Now the problem is that master starts on localhost:60000, not at >> namenode:60000, so on master node it's ok, but when regionserver try to >> connect to master at localhost:60000 they don't find (naturally) nothing at >> it's launched MasterNotRunningException, so that regionserver, after >> connecting to zookeeper, crash because of that. >> I found out in logs file on regionserver that they connect to zookeeper >> cluster and then they crash because they don't find a running master on >> localhost:60000, so it's right. But the strange thing is that in conf files >> i never used "localhost". I also tried so set the property hbase.master at >> namenode:60000, but this property isn't used from years, so it doesn't work >> anymore. What can i do? >> > > --bcaec544f02eba46d004db95eb6f--