Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CD784200B84 for ; Tue, 6 Sep 2016 06:52:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CBF0A160ACC; Tue, 6 Sep 2016 04:52:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C60E0160ABC for ; Tue, 6 Sep 2016 06:52:25 +0200 (CEST) Received: (qmail 72314 invoked by uid 500); 6 Sep 2016 04:52:24 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 72303 invoked by uid 99); 6 Sep 2016 04:52:24 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Sep 2016 04:52:24 +0000 Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id A0E921A0046 for ; Tue, 6 Sep 2016 04:52:23 +0000 (UTC) Received: by mail-wm0-f54.google.com with SMTP id b187so44065870wme.1 for ; Mon, 05 Sep 2016 21:52:23 -0700 (PDT) X-Gm-Message-State: AE9vXwPyDhceewo74YsgFy6V01f63v1noxzybU2OfonRI1Nn41huJQikg2t9YI72ETPX+C3axqzSXYSVG73zmA== X-Received: by 10.28.166.197 with SMTP id p188mr12124405wme.85.1473137542211; Mon, 05 Sep 2016 21:52:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.114.9 with HTTP; Mon, 5 Sep 2016 21:51:41 -0700 (PDT) In-Reply-To: References: <06089C15-4B61-46C4-A7F7-F11F866275B6@gmail.com> From: Dima Spivak Date: Mon, 5 Sep 2016 21:51:41 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: HBase on docker NotServingRegionException because of hostname alisas To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=94eb2c129f2ae8c349053bcf8fda archived-at: Tue, 06 Sep 2016 04:52:27 -0000 --94eb2c129f2ae8c349053bcf8fda Content-Type: text/plain; charset=UTF-8 Hey Pierre, Sorry, I just don't think it's worth the time trying to debug this framework when a more robust one exists. Perhaps try reaching out to "kiwenlau?" -Dima On Mon, Sep 5, 2016 at 9:49 PM, Pierre Caserta wrote: > Thanks Dima, > Now even if I use a network called hadoopnet.com > I still have the same problem. > Here are my regionservers that get detected: > > Region Servers > Base Stats > Memory > Requests > Storefiles > Compactions > > ServerName Start time Version Requests Per Second Num. > Regions > hadoop-slave1.hadoopnet.com,16020,1473137128613 hadoopnet.com:16030/rs-status> Tue Sep 06 04:45:28 UTC 2016 1.2.2 > 0 0 > hadoop-slave1.hadoopnet.com.hadoopnet.com,16020,1473137128613 < > http://hadoop-slave1.hadoopnet.com.hadoopnet.com:60010/rs-status> > Tue Sep 06 04:45:28 UTC 2016 Unknown 0 0 > hadoop-slave2.hadoopnet.com,16020,1473137127975 hadoopnet.com:16030/rs-status> Tue Sep 06 04:45:27 UTC 2016 1.2.2 > 0 0 > hadoop-slave2.hadoopnet.com.hadoopnet.com,16020,1473137127975 < > http://hadoop-slave2.hadoopnet.com.hadoopnet.com:60010/rs-status> > Tue Sep 06 04:45:27 UTC 2016 Unknown 0 0 > Total:4 2 nodes with inconsistent version 0 0 > instead of just hadoop-slave1.hadoopnet.com,16020,1473137128613 < > http://hadoop-slave1.hadoopnet.com:16030/rs-status> and > hadoop-slave2.hadoopnet.com,16020,1473137127975 hadoopnet.com:16030/rs-status> > This is the script I used to start the hadoop cluster > > --- > #!/bin/bash > > # the default node number is 3 > N=${1:-3} > > > NETWORK=hadoopnet.com > docker rm -f zk.$NETWORK &> /dev/null > echo "start zk container..." > docker run -p 2181:2181 --name zk.$NETWORK --hostname zk.$NETWORK > --net=$NETWORK -itd -v conf:/opt/zookeeper/conf -v data:/tmp/zookeeper > jplock/zookeeper > > # start hadoop master container > docker rm -f hadoop-master.$NETWORK &> /dev/null > echo "start hadoop-master container..." > docker run -itd \ > --net=$NETWORK \ > -P \ > --name hadoop-master.$NETWORK \ > --hostname hadoop-master.$NETWORK \ > --add-host zk.$NETWORK:$(docker inspect -f "{{with index > .NetworkSettings.Networks \"${NETWORK}\"}}{{.IPAddress}}{{end}}" > zk.$NETWORK) \ > casertap/hhb > > > # start hadoop slave container > i=1 > while [ $i -lt $N ] > do > docker rm -f hadoop-slave$i.$NETWORK &> /dev/null > echo "start hadoop-slave$i container..." > docker run -itd \ > --net=$NETWORK \ > --name hadoop-slave$i.$NETWORK \ > --hostname hadoop-slave$i.$NETWORK \ > --publish-all=false \ > --add-host hadoop-master.$NETWORK:$(docker inspect -f > "{{with index .NetworkSettings.Networks \"${NETWORK}\"}}{{.IPAddress}}{{end}}" > hadoop-master.$NETWORK) \ > --add-host zk.$NETWORK:$(docker inspect -f "{{with index > .NetworkSettings.Networks \"${NETWORK}\"}}{{.IPAddress}}{{end}}" > zk.$NETWORK) \ > casertap/hhb > i=$(( $i + 1 )) > done > > # get into hadoop master container > docker exec -it hadoop-master.$NETWORK bash > --- > > Thanks, > pierre > > > On 6 Sep 2016, at 08:47, Dima Spivak wrote: > > > > Sounds good, Pierre. FWIW, if you want a preview, here's how to get a > > 5-node HBase cluster running based on the master branch of HBase in > about a > > minute: > > > > 1. Source the clusterdock.sh script that defines the clusterdock_ helper > > functions: source /dev/stdin <<< "$(curl -sL > > http://tiny.cloudera.com/clusterdock.sh clusterdock.sh>)" > > 2. Start up a cluster: CLUSTERDOCK_TOPOLOGY_IMAGE= > > hbasejenkinsuser-docker-hbase.bintray.io/dev/clusterdock: > apache_hbase_topology > > clusterdock_run ./bin/start_cluster -r > > hbasejenkinsuser-docker-hbase.bintray.io --namespace dev apache_hbase > > --hbase-version=master --hadoop-version=2.7.1 > > --secondary-nodes='node-{2..5}' > > > > And that's it. Feel free to put a -h for help information (put it right > > after the ./bin/start_cluster for details about the function or after the > > apache_hbase for details about the Apache HBase topology. > > > > -Dima > > > > On Mon, Sep 5, 2016 at 3:44 PM, Pierre Caserta > > > wrote: > > > >> Thanks for your answer. > >> I will check the ticket https://issues.apache.org/ > jira/browse/HBASE-15961 > > >> https://issues.apache.org/jira/browse/HBASE-15961>> regularly and try > >> clusterdock as soon as the documentation comes out. > >> I will try to use hostname with domain like: master.hadoopnet.com < > http://master.hadoopnet.com/> < > >> http://master.hadoopnet.com/ > and > network named hadoopnet.com < > >> http://hadoopnet.com/ > to try if this resolve > the problem. > >> Currently my hostnames are hadoop-master, hadoop-slave1 and > hadoop-slave2, > >> maybe that is the problem. > >> > >>> On 5 Sep 2016, at 23:31, Dima Spivak wrote: > >>> > >>> clusterdock uses --net=host for running the framework out of a > container, > >>> but each Hadoop/HBase cluster itself runs with its own bridge network. > >> Just > >>> suggesting clusterdock since it's what we now use for testing HBase > >>> releases and it looks a bit more sophisticated than this other project > >>> (e.g. no need to rebuild images for different cluster sizes). > >>> > >>> The error you're seeing is caused by not using the FQDN of the > containers > >>> when referring to them; Docker networks use the network name as the > >> domain. > >>> > >>> On Monday, September 5, 2016, Pierre Caserta >> >> > >>> wrote: > >>> > >>>> That is a good script thanks but I would like to understand exactly > what > >>>> is the problem with my config without adding another level of > >> abstraction > >>>> and just running the clusterdock command. > >>>> In your script I can see that you are using --net=host. I think this > is > >>>> the main difference compared to what I am doing which is creating a > >> bridge > >>>> network for the hadoop cluster. > >>>> I have only 3 machines: hadoop-master, hadoop-slave1, hadoop-slave2. > >>>> > >>>> Why do those strange hadoop-slave2.hadoopnet alias appear in the web > ui? > >>>> It looks like the network name is used as part of the hostname. > >>>> Any idea what it is happening in my case? > >>>> > >>>> Pierre > >>>> > >>>>> On 5 Sep 2016, at 16:48, Dima Spivak >>>> > wrote: > >>>>> > >>>>> You should try the Apache HBase topology for clusterdock that was > >>>> committed > >>>>> a few months back. See HBASE-12721 for details. > >>>>> > >>>>> On Sunday, September 4, 2016, Pierre Caserta < > pierre.caserta@gmail.com > >> > > >>>> > > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> I am building a fully distributed hbase cluster with unmanaged > >>>> zookeeper. > >>>>>> I pretty much used this example and install hbase on top of it: > >>>>>> https://github.com/kiwenlau/hadoop-cluster-docker > >>>>>> > >>>>>> Hadoop and hdfs works fine but I get this exception with hbase: > >>>>>> > >>>>>> 2016-09-05 06:27:12,268 INFO [hadoop-master:16000. > >>>> activeMasterManager] > >>>>>> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at > >>>>>> address=hadoop-slave2,16020,1473052276351, > >> exception=org.apache.hadoop. > >>>>>> hbase.NotServingRegionException: Region hbase:meta,,1 is not online > >> on > >>>>>> hadoop-slave2.hadoopnet,16020,1473056813966 > >>>>>> at org.apache.hadoop.hbase.regionserver.HRegionServer. > >>>>>> getRegionByEncodedName(HRegionServer.java:2910) > >>>>>> > >>>>>> This is bloking because any command I enter on the hbase shell will > >>>> return > >>>>>> the following error: > >>>>>> > >>>>>> ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is > >>>>>> initializing > >>>>>> > >>>>>> The containers are runned using --net=hadoopnet > >>>>>> which is a network create as such: > >>>>>> > >>>>>> docker network create --driver=bridge hadoopnet > >>>>>> > >>>>>> The hbase webui is showing this: > >>>>>> > >>>>>> Region Servers > >>>>>> ServerName Start time Version Requests Per Second Num. > >>>>>> Regions > >>>>>> hadoop-slave1,16020,1473056814064 Mon Sep 05 06:26:54 UTC 2016 > >>>>>> 1.2.2 0 0 > >>>>>> hadoop-slave1.hadoopnet,16020,1473056814064 Mon Sep 05 06:26:54 > UTC > >>>>>> 2016 Unknown 0 0 > >>>>>> hadoop-slave2,16020,1473056813966 Mon Sep 05 06:26:53 UTC 2016 > >>>>>> 1.2.2 0 0 > >>>>>> hadoop-slave2.hadoopnet,16020,1473056813966 Mon Sep 05 06:26:53 > UTC > >>>>>> 2016 Unknown 0 0 > >>>>>> Total:4 2 nodes with inconsistent version 0 > >> 0 > >>>>>> > >>>>>> I should have only 2 regionservers but 2 strange > >> hadoop-slave1.hadoopnet > >>>>>> and hadoop-slave2.hadoopnet are added to the list. > >>>>>> When I look at zk using: > >>>>>> > >>>>>> /usr/local/hbase/bin/hbase zkcli -server zk:2181 ls /hbase/rs > >>>>>> > >>>>>> I only see my 2 regionserver: hadoop-slave1,16020,1473056814064 and > >>>>>> hadoop-slave2,16020,1473056813966 > >>>>>> > >>>>>> Looking at the zookeeper.MetaTableLocator: Failed verification > error I > >>>> see > >>>>>> that hadoop-slave2,16020,1473052276351 and > >>>> hadoop-slave2.hadoopnet,16020,1473056813966 > >>>>>> get mixed up. > >>>>>> > >>>>>> here is my config on all server > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> hbase.rootdir > >>>>>> hdfs://hadoop-master:9000/hbase > >>>>>> The directory shared by region servers. > >> Should > >>>>>> be fully-qualified to include the filesystem to use. E.g: > >>>>>> hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR > >>>>>> > >>>>>> > >>>>>> hbase.master > >>>>>> hdfs://hadoop-master:60000 > >>>>>> The host and port that the HBase master runs > >>>>>> at. > >>>>>> > >>>>>> > >>>>>> hbase.cluster.distributed > >>>>>> true > >>>>>> The mode the cluster will be in. Possible > >>>>>> values are > >>>>>> false: standalone and pseudo-distributed setups with > >>>> managed > >>>>>> Zookeeper > >>>>>> true: fully-distributed with unmanaged Zookeeper Quorum > >>>> (see > >>>>>> hbase-env.sh) > >>>>>> > >>>>>> > >>>>>> hbase.master.info.port > >>>>>> 60010 > >>>>>> The UI interface of HBase master > >>>>>> runs. > >>>>>> > >>>>>> > >>>>>> hbase.zookeeper.quorum > >>>>>> zk > >>>>>> string m_e_m_b_e_r_s is replaced by list of > >>>>>> hosts separated by comma. Its generated by configure-slaves.sh on > >> master > >>>>>> node > >>>>>> > >>>>>> > >>>>>> hbase.zookeeper.property.maxClientCnxns > >>>>>> 300 > >>>>>> > >>>>>> > >>>>>> hbase.zookeeper.property.datadir > >>>>>> /tmp/zookeeper > >>>>>> location of storage of zookeeper > >>>>>> data > >>>>>> > >>>>>> > >>>>>> hbase.zookeeper.property.clientPort > >>>>>> 2181 > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> I created a stack overflow question as well: > >> http://stackoverflow.com/ > >>>>>> questions/39325041/hbase-on-docker-notservingregionexception- > >>>>>> because-of-hostname-alisas >>>>>> questions/39325041/hbase-on-docker-notservingregionexception- > >>>>>> because-of-hostname-alisas> > >>>>>> > >>>>>> Thanks, > >>>>>> Pierre > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> -Dima > >>>> > >>>> > >>> > >>> -- > >>> -Dima > > --94eb2c129f2ae8c349053bcf8fda--