Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of john.foxinhead@gmail.com
 designates 209.85.210.65 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAD1we=CO-PtytTt4dR_ptAQys3Mw2kYxVLGhZ6k8EzkDOwPEzg@mail.gmail.com>
References: 
 <CAD1we=DFMQC9827v3nHLVt5cY-BLdNixfL90ZWvga=ve8DEOKg@mail.gmail.com>
	<CAD1we=CO-PtytTt4dR_ptAQys3Mw2kYxVLGhZ6k8EzkDOwPEzg@mail.gmail.com>
Date: Tue, 30 Apr 2013 17:47:38 +0200
Message-ID: 
 <CAD1we=A7zP+Q0M-5nrAeYBgV6DidEqJf35J3751E4ts85i-QWg@mail.gmail.com>
Subject: Re: Re: While starting 3-nodes cluster hbase: WARN
 org.apache.zookeeper.ClientCnxn: Session 0x0 for server null
From: John Foxinhead <john.foxinhead@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=bcaec544f02eba46d004db95eb6f

--bcaec544f02eba46d004db95eb6f
Content-Type: text/plain; charset=ISO-8859-1

Now I post my configurations:
I use a 3 nodes cluster with all the nodes runnind hadoop, zookeeper and
hbase. Hbase master, a zookeeper daemon and Hadoop namenode run on the same
host. Hbase regionserver, a zookeeper daemon and hadoop datanode run on the
other 2 nodes. I called one of the datanodes "jobtracker" because of the
various configuration i tried, but it is a datanode like "datanode1"
because i configured also the jobtracker while installing hadoop but i
never used it as jobtracker, but as datanode, because hbase doesn't need
the use of Map-Reduce algorithm.
I run all on the same pc: the 3 nodes are 3 virtual machines running on
VirtualBox connected throught internal network or bridged adapter network
interfaces (theese are configurations of virtualbox)
It's important to know that i use 3 virtual machines because comunication
is very very slow, expecially at startup of hadoop, zookeeper and hbase.


HADOOP:

hadoop-env.sh:
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export HADOOP_CLASSPATH=/home/debian/hadoop-1.0.4/lib
export HADOOP_HEAPSIZE=1000

core-site.xml:
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://namenode:9000/</value>
  </property>
</configuration>

hdfs-site.xml:
<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/home/debian/hadoop-1.0.4/FILESYSTEM/name</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/home/debian/hadoop-1.0.4/FILESYSTEM/data</value>
  </property>
  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
  </property>
</configuration>

masters:
slaves:
jobtracker
datanode1


HBASE:

hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export HBASE_CLASSPATH=/home/debian/hbase-0.94.5/lib
export HBASE_MANAGES_ZK=false

hbase-site.xml:
<configuration>
    <property>
        <name>dfs.support.append</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://namenode:9000/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>namenode,jobtracker,datanode1</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/home/debian/hbase-0.94.5/zookeeper/data</value>
    </property>
    <property>
        <name>hbase.master</name>
        <value>namenode:60000</value>
    </property>
</configuration>
note: i think the property hbase.master doesn't work from years, so it can
be deleted, but after a lot of tries my hbase worked, so i left it there.
I'll try to delete it later.
regionservers:
jobtracker
datanode


OS FILES:

/etc/hosts:

127.0.0.1    localhost
127.0.0.1    debian01
#  HADOOP
192.168.1.111    jobtracker
192.168.1.112    datanode1
192.168.1.121    namenode
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

/etc/hostname:
namenode (or jobtracker, or datanode1, depending on the node)

/etc/network/interfaces (to set static IPs: on namenode: address
192.168.1.121, on jobtracker: address 192.168.1.111, on datanode1: address
192.168.1.112):
iface eth6 inet static
address 192.168.1.121
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255
gateway 192.168.1.254
dns-nameserver 8.8.8.8 8.8.4.4
note: eth6 because eth2 (where i had "bridged network adapter" virtual
interface) was remapped on eth6 (you can verifying it with "$ dmesg | grep
eth"), so replace eth6 with your interface.


MY PROBLEMS (think i copied hbase and hadoop directories from the working
pseudo-distribuited version directory, so pseudo-distribuited version
works):

1) After starting up hadoop, trying some shell comands to put a file in
hadoop filesystem and later get the same file from HFDS, i get the file,
but the file is empty.
SOLUTION: FILESYSTEM, FILESYSTEM/data and FILESYSTEM/name directory must
have 755 (rwxr-xr-x) permission.
2) After starting up hadoop, trying some shell comands to put a file in
hadoop filesystem and later get the same file from HFDS, i receive
warnings/error (on log files) related to the mismatching beetween the
expected and the received ID of the blocks.
EXPLAINING: It could appen if, after the use of an HDFS, for examples
putting files into it, i use "bin/hadoop namenode -format" to formatting a
new HDFS, and i have changed the hdfs.data.dir and hdfs.name.dir to a
persistent location (default is a tmp location, which is cleared on restart
of the OS). "bin/hadoop namenode -format" formats the hdfs.name.dir
directory and get a new ID for HDFS blocks. It doesn't format hdfs.data.dir
directory on datanodes, so datanodes expect the old blocks' ID and there is
a mismatching.
SOLUTION: clear all hdfs.data.dir directories on all datanodes, then
reformat a new filesystem using bin/hadoop namenode -format on namenode.

3)Hbase, while starting managing zookeeper, can't connect to zookeeper (1):
SOLUTION: set HBASE_MANAGES_ZK=false, so that hbase can't manage zookeeper.
It's recommended in my case because i launch 3 virtual machines, so hbase
fails connecting to zookeeper because it reaches the fail limit before
zookeeper cluster starts up completely. So i run zookeeper on all of the 3
nodes with "$ /bin/hbase-daemon.sh start zookeeper" and i wait some
minutes. This is because of the slow connection between 3 Virtual Machines.
Then i try zookeeper cluster with some "ls /" from zk shell (launch it with
"$ bin/hbase zkcli") and i ensure the shell connects with the right node on
the right port while launching zk shell (that is a zk client)

4)Hbase, without managing zookeeper, can't connect to zookeeper. All
configurations where right, as written above, but hbase launched a 1-node
zookeeper cluster on master at localhost and connect to it, Also master
doesn't start regionserver. It's a strange problem.
SOLUTION: This solution is as strange as the problem. Configuration files
were right, bit hbase didn't work, so i opened a regionserver
virtualmachine, i completely removed hbase directories, i copied
hbase-pseudo-distribuited directory and i renamed it as the previous one. I
manually copied all the configuration files from the hbase directory on the
master. Then I closed all the virtual machines, i made a backup of the old
master, and i deleted all the vms, except the backup and the slave vm in
which i re-copied configuration files. I made other 2 clones of this
virtual machine (with the new hbase folder) and i modified only
/etc/network/interfaces, so I set the proper IP for each of the VMs. Then
hbase was able to connect to the zookeeper cluster and hbase was able to
start regionserver. I think it was because of some rubbish left during the
lots of tries i made on master node, so copying conf files in a slave node
and making it became the new master solved my problem. Then i made another
backup, to clean the system from future rubbish and solve problems like
this.

5)Hbase connects to the running zookeeper cluster but there is the last
problem: master launches regionservers, but on the regionservers' nodes,
when the regionserver daemon starts, it try to connect to master at
localhost:60000, instead of connecting at namenode:60000.
SOLUTION: The property hbase.master in unuseful because it's not supported
for years. So that the problem is the file /etc/hostname. It's contento is
"debian01" to all the nodes, but it could be "namenode" on the namenode,
"datanode" on the datanode and "jobtracker" on the jobtracker (The hostname
used in hbase conf files referring to each node). This was my last changing
in configurations. When i changed also this, finally hbase worked properly.
Note: just relogging will not make effective the changes in /etc/hostname,
in fact when you relog you'll see, for example, something like
"debian@debian01", even if you already changed "debian01" with "namenode".
You need to completely shut down the OS and restart it to make the changes
work.


Now Hadoop, Zookeeper and Hbase work, and also some jar compiled to test
some simple instructions like Put and Get not from hbase shell, but from
Hbase Java API work.
Thank you all, and I hope someone else cold take advantage from my issues.


2013/4/30 John Foxinhead <john.foxinhead@gmail.com>

> I solved the last problem:
> I modified the file /etc/hostname and i replaced the default hostname,
> "debian01" with "namenode", "jobtracker", or " datanode", the hostnames i
> used in hbase conf files. Now i start hbase fro master with
> "bin/start-hbase.sh" and regionservers, instead of trying to connect with
> master at localhost:60000, connect with namenode:60000.
> Now all is working good. Thank you all. Later I will post my configuration
> files and make a summary of the problems I encountered, so that other users
> can take advantage from those.
>
>
> 2013/4/30 John Foxinhead <john.foxinhead@gmail.com>
>
>> I solved my problem with zookeeper. I don't know how, maybe it was a
>> spell xD
>> I made this way: on a slave i removed the directory of hbase, and i
>> copied the diectory of hbase-pseudo-distribuited (which works). Then i
>> copied all the configurations from the virtual machines which runned as
>> master in the new directory, making it distribuited. Then i cloned the
>> virtual machine 2 times, i made some configuration in and in
>> /etc/network/interfaces file to set the proper IP on the VMs, and then
>> zookeeper magically worked. All the configuration were the same. Maybe i
>> made some wrong configuration in some OS file, or there was some rubbish
>> left by the hundreds of tries i made on the master. Then, changing the VMs
>> working as master solved my problem.
>> Now:
>> - I start HDFS with "$ ~/hadoop-1.0.4/bin/start-dfs.sh"
>> - i try some command from hadoop shell to ensure it works (I found out
>> that the directory on local fs that datanodes and namenode use as
>> storage-space for HDFS' files' blocks need to has permission 755,
>> otherwise, even if permission are larger, when you put a file in HDFS, the
>> file in HDFS is created, bit it's content isn't tranferred so when you get
>> the file you find out that the file is empty)
>> - i start zookeeper on my 3 VMs with "$
>> ~/hbase-0.94.5/bin/hbase-daemon.sh start zookeeper" and i wait 2-3 minutes
>> to be sure zookeeper completely started. Then i check in logs for some
>> errors or warning, and i use "$ ~/hbase-0.94.5/bin/hbase zkcli" with some
>> "ls" to ensure the client connect on zookeeper on the right node and port
>> (2181). Related to zookeeper i found out that with HBASE_MANAGE_ZK=true in
>> hbase-env.sh file, there was an error because zookeeper does't have time to
>> set up properly before hbase master is launched. So, with a lot of VMs (i
>> use 3, and they are a lot) it's better set HBASE_MANAGE_ZK=false and start
>> it manually on the nodes so that you can wait until zookeeper is set up,
>> before launch master.
>> - All works properly until now so i start hbase with "$
>> ~/hbase-0.94.5/bin/start-hbase.sh. Now the output shows that master launch
>> also regionservers on regionservers' nodes (good, because before it showed
>> only that the master was launched on localhost, but nothing about
>> regionserver). When i see the logs file on both master and regionserver's
>> logs directory it shows that hbase daemons connect  properly on the
>> zookeeper cluster reported in zookeeper.property.quorum (or something
>> similar) property in hbase-site.xml and the port also is right (2181, the
>> same used by the tool zkcli).
>>
>> Now the problem is that master starts on localhost:60000, not at
>> namenode:60000, so on master node it's ok, but when regionserver try to
>> connect to master at localhost:60000 they don't find (naturally) nothing at
>> it's launched MasterNotRunningException, so that regionserver, after
>> connecting to zookeeper, crash because of that.
>> I found out in logs file on regionserver that they connect to zookeeper
>> cluster and then they crash because they don't find a running master on
>> localhost:60000, so it's right. But the strange thing is that in conf files
>> i never used "localhost". I also tried so set the property hbase.master at
>> namenode:60000, but this property isn't used from years, so it doesn't work
>> anymore. What can i do?
>>
>
>

--bcaec544f02eba46d004db95eb6f--