From user-return-56559-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Mon Mar 23 05:20:30 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1476D180645 for ; Mon, 23 Mar 2020 06:20:29 +0100 (CET) Received: (qmail 89888 invoked by uid 500); 23 Mar 2020 05:15:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 89845 invoked by uid 99); 23 Mar 2020 05:15:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2020 05:15:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 4FBC01A40AC for ; Mon, 23 Mar 2020 05:15:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.003 X-Spam-Level: X-Spam-Status: No, score=0.003 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, KAM_SHORT=0.001, NUMERIC_HTTP_ADDR=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=garvan.org.au Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 1mN-vXFPq8Hd for ; Mon, 23 Mar 2020 05:15:15 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=124.47.189.113; helo=au-smtp-delivery-113.mimecast.com; envelope-from=manuel.sb@garvan.org.au; receiver= Received: from au-smtp-delivery-113.mimecast.com (au-smtp-delivery-113.mimecast.com [124.47.189.113]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 2F3927F623 for ; Mon, 23 Mar 2020 05:15:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=garvan.org.au; s=mimecast20191118; t=1584940503; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NOxt9wsGdW+hkZtt0e1MJrErgkzDsY8/kyQEGJtC8gM=; b=iAkD2/N3wbC6Ns8FK/iZKda9OYvyqB7/Cv00B1dA7JGJ68GvCw1iwmdbh5oQXL/zeNFTSy e49q7OpbFOescOdRl84D7GeHyblrGjq7LnOy8Jck2Wh8LGPUXBEUu5FWfxwubnwCS3CSnC N1IyKZ9W0v+5ZsCPOvA8hjc55P5M/Jc= Received: from HUB2.ad.garvan.unsw.edu.au (129.94.136.96 [129.94.136.96]) (Using TLS) by relay.mimecast.com with ESMTP id au-mta-30-I66PHa3HPsG4Wa_ODlDNcQ-1; Mon, 23 Mar 2020 16:14:59 +1100 X-MC-Unique: I66PHa3HPsG4Wa_ODlDNcQ-1 Received: from levis.ad.garvan.unsw.edu.au (129.94.136.42) by hub2.ad.garvan.unsw.edu.au (129.94.136.96) with Microsoft SMTP Server (TLS) id 14.3.435.0; Mon, 23 Mar 2020 16:14:57 +1100 Received: from kingston.ad.garvan.unsw.edu.au (129.94.136.41) by levis.ad.garvan.unsw.edu.au (129.94.136.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1779.2; Mon, 23 Mar 2020 16:14:58 +1100 Received: from kingston.ad.garvan.unsw.edu.au ([fe80::f9b1:162a:2d6e:48df]) by kingston.ad.garvan.unsw.edu.au ([fe80::f9b1:162a:2d6e:48df%6]) with mapi id 15.01.1779.002; Mon, 23 Mar 2020 16:14:58 +1100 From: Manuel Sopena Ballesteros To: "user@hbase.apache.org" Subject: Re: regionserver can't connect to master Thread-Topic: regionserver can't connect to master Thread-Index: AdX/ERenSuTviQFnSQumfhRXEEOHdgAKq+yAAGJZ2dM= Date: Mon, 23 Mar 2020 05:14:58 +0000 Message-ID: <42a73d223b414e5f93ddfc4b4a2f3641@garvan.org.au> References: <97cdfd39554f4d618cdff3d262f93253@garvan.org.au>, In-Reply-To: Accept-Language: en-AU, en-US Content-Language: en-AU X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [172.26.75.162] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: garvan.org.au Content-Type: multipart/related; boundary="_004_42a73d223b414e5f93ddfc4b4a2f3641garvanorgau_"; type="multipart/alternative" --_004_42a73d223b414e5f93ddfc4b4a2f3641garvanorgau_ Content-Type: multipart/alternative; boundary="_000_42a73d223b414e5f93ddfc4b4a2f3641garvanorgau_" --_000_42a73d223b414e5f93ddfc4b4a2f3641garvanorgau_ Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Hi Jasani, Which HBase version are you using? [luffy@gl-hdp-ctrl03 ~]$ hbase version SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0= .3.1.0.0-78-server.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4= j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explana= tion. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase 2.0.2.3.1.0.0-78 Source code repository git://ctr-e138-1518143905142-586755-01-000023.hwx.si= te/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hbase revision=3D Compiled by jenkins on Thu Dec 6 12:27:45 UTC 2018 From source with checksum 015c34650c163b249d16fc7e496a030e You are bringing up fresh cluster and not doing an upgrade right? Yes this is a fresh cluster I am deploying through ambari blueprints (I alw= ays reset ambari to factory settings before deploy the blueprint) Has Ambari successfully brought up NameNodes and DataNodes? I think so [cid:0cef77dd-f616-45ef-8214-e0bb0006b665] How-many components are already running so far? { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services", "items" : [ { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/A= MBARI_METRICS", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "AMBARI_METRICS" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/H= BASE", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "HBASE" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/H= DFS", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "HDFS" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/H= IVE", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "HIVE" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/M= APREDUCE2", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "MAPREDUCE2" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/S= MARTSENSE", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "SMARTSENSE" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/S= PARK2", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "SPARK2" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/T= EZ", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "TEZ" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/Y= ARN", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "YARN" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/Z= EPPELIN", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "ZEPPELIN" } }, { "href" : "http://10.0.1.245:8080/api/v1/clusters/Grandline/services/Z= OOKEEPER", "ServiceInfo" : { "cluster_name" : "Grandline", "service_name" : "ZOOKEEPER" } } ] } Are they connected(e.g. NN and DN) and only RS is having trouble connecting= to HM? Yes, this is my understanding Although telnet seems correct, can you also try "nc -zv gl-hdp-ctrl03.local= 16000" from RS just to double check? $ nc -zv gl-hdp-ctrl03.local 16000 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 192.168.20.248:16000. Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds. thank you ________________________________ From: Viraj Jasani Sent: Sunday, 22 March 2020 2:47:09 AM To: user@hbase.apache.org Subject: Re: regionserver can't connect to master Which HBase version are you using? You are bringing up fresh cluster and no= t doing an upgrade right? Has Ambari successfully brought up NameNodes and = DataNodes? How-many components are already running so far? Are they connect= ed(e.g. NN and DN) and only RS is having trouble connecting to HM? Although= telnet seems correct, can you also try "nc -zv gl-hdp-ctrl03.local 16000" = from RS just to double check? Thanks On 2020/03/20 23:45:28, Manuel Sopena Ballesteros = wrote: > Dear HBase community, > > I am having an issue with my ambari hbase deployment where regionserver i= s not able to connect to master > > Hbase Master log files: > 2020-03-21 02:36:53,614 INFO [Thread-16] master.ServerManager: Waiting on= regionserver count=3D0; waited=3D3174901ms, expecting min=3D1 server(s), m= ax=3DNO_LIMIT server(s), timeout=3D30000ms, lastChange=3D-3174901ms > 2020-03-21 02:36:54,287 WARN [master/gl-hdp-ctrl03:16000] assignment.Assi= gnmentManager: No servers available; cannot place 1 unassigned regions. > > Hbase region server logs: > Caused by: java.net.ConnectException: Call to gl-hdp-ctrl03.local/192.168= .20.248:16000 failed on connection exception: org.apache.hbase.thirdparty.i= o.netty.channel.ConnectTimeoutException: connection timed out: gl-hdp-ctrl0= 3.local/192.168.20.248:16000 > at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:166) > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractR= pcClient.java:390) > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcCl= ient.java:95) > at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.= java:410) > at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.= java:406) > at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103) > at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118) > > > Test connectivity from region server to master > $ telnet gl-hdp-ctrl03.local 16000 > Trying 192.168.20.248... > Connected to gl-hdp-ctrl03.local. > Escape character is '^]'. > > Any idea of why region can't connect? > > Thank you very much > NOTICE > Please consider the environment before printing this email. This message = and any attachments are intended for the addressee named and may contain le= gally privileged/confidential/copyright information. If you are not the int= ended recipient, you should not read, use, disclose, copy or distribute thi= s communication. If you have received this message in error please notify u= s at once by return email and then delete both messages. We accept no liabi= lity for the distribution of viruses or similar in electronic communication= s. This notice should not be removed. > NOTICE Please consider the environment before printing this email. This message an= d any attachments are intended for the addressee named and may contain lega= lly privileged/confidential/copyright information. If you are not the inten= ded recipient, you should not read, use, disclose, copy or distribute this = communication. If you have received this message in error please notify us = at once by return email and then delete both messages. We accept no liabili= ty for the distribution of viruses or similar in electronic communications.= This notice should not be removed. --_000_42a73d223b414e5f93ddfc4b4a2f3641garvanorgau_ Content-Type: text/html; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable

Hi Jasani,


Which HBase version are you using? 

[luffy@gl-hdp-ctrl03 ~]$ hbase version

SLF4J: Class path contains multiple SLF4J b= indings.

SLF4J: Found binding in [jar:file:/usr/hdp/= 3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-server.jar!/org/slf4j/impl/Stat= icLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/= 3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLogger= Binder.class]

SLF4J: See http://www.slf4j.org/codes.html#= multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j= .impl.Log4jLoggerFactory]

HBase 2.0.2.3.1.0.0-78

Source code repository git://ctr-e138-15181= 43905142-586755-01-000023.hwx.site/grid/0/jenkins/workspace/HDP-parallel-ce= ntos7/SOURCES/hbase revision=3D

Compiled by jenkins on Thu Dec  6 12:27:45 UTC 2018

From source with checksum 015c34650c163b249= d16fc7e496a030e


You are bringing up fresh cluster and not doing an = upgrade right? 

Yes this is a fresh cluster I am deploying through = ambari blueprints (I always reset ambari to factory settings before deploy the blueprint)


Has Ambari successfully brought up NameNodes and Da= taNodes? 

I think so



How-many components are already running so far?&nbs= p;

{

&nb= sp; "href" : "http://10.0.1.245:8080/api/v1/clusters/= Grandline/services",

&nb= sp; "items" : [

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/AMBARI_METRICS",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "AMBARI_METRICS"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/HBASE",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "HBASE"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/HDFS",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "HDFS"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/HIVE",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "HIVE"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/MAPREDUCE2",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "MAPREDUCE2"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/SMARTSENSE",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "SMARTSENSE"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/SPARK2",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "SPARK2"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/TEZ",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "TEZ"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/YARN",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "YARN"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/ZEPPELIN",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "ZEPPELIN"

&nb= sp;     }

&nb= sp;   },

&nb= sp;   {

&nb= sp;     "href" : "http://10.0.1.245:8080/ap= i/v1/clusters/Grandline/services/ZOOKEEPER",

&nb= sp;     "ServiceInfo" : {

&nb= sp;       "cluster_name" : "Grandline",

&nb= sp;       "service_name" : "ZOOKEEPER"

&nb= sp;     }

&nb= sp;   }

&nb= sp; ]

}


Are they connected(e.g. NN and DN) and only RS is h= aving trouble connecting to HM? 

Yes, this is my understanding


Although telnet seems correct, can you also try &qu= ot;nc -zv gl-hdp-ctrl03.local 16000" from RS just to double check?

$ nc -zv gl-hdp-ctrl03.local 16000

Ncat: Version 7.50 ( https://nmap.org/ncat = )

Ncat: Connected to 192.168.20.248:16000.

Ncat: 0 bytes sent, 0 bytes received in 0.0= 2 seconds.


thank you


From: Viraj Jasani <vj= asani@apache.org>
Sent: Sunday, 22 March 2020 2:47:09 AM
To: user@hbase.apache.org
Subject: Re: regionserver can't connect to master
 
Which HBase version are you using? You are bringin= g up fresh cluster and not doing an upgrade right? Has Ambari successfully = brought up NameNodes and DataNodes? How-many components are already running= so far? Are they connected(e.g. NN and DN) and only RS is having trouble connecting to HM? Although telnet se= ems correct, can you also try "nc -zv gl-hdp-ctrl03.local 16000" = from RS just to double check?
Thanks

On 2020/03/20 23:45:28, Manuel Sopena Ballesteros <manuel.sb@garvan.org.= au> wrote:
> Dear HBase community,
>
> I am having an issue with my ambari hbase deployment where regionserve= r is not able to connect to master
>
> Hbase Master log files:
> 2020-03-21 02:36:53,614 INFO [Thread-16] master.ServerManager: Waiting= on regionserver count=3D0; waited=3D3174901ms, expecting min=3D1 server(s)= , max=3DNO_LIMIT server(s), timeout=3D30000ms, lastChange=3D-3174901ms
> 2020-03-21 02:36:54,287 WARN [master/gl-hdp-ctrl03:16000] assignment.A= ssignmentManager: No servers available; cannot place 1 unassigned regions.<= br> >
> Hbase region server logs:
> Caused by: java.net.ConnectException: Call to gl-hdp-ctrl03.local/192.= 168.20.248:16000 failed on connection exception: org.apache.hbase.thirdpart= y.io.netty.channel.ConnectTimeoutException: connection timed out: gl-hdp-ct= rl03.local/192.168.20.248:16000
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:166)=
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(Abstra= ctRpcClient.java:390)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRp= cClient.java:95)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClie= nt.java:410)
> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClie= nt.java:406)
> at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
> at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
>
>
> Test connectivity from region server to master
> $ telnet gl-hdp-ctrl03.local 16000
> Trying 192.168.20.248...
> Connected to gl-hdp-ctrl03.local.
> Escape character is '^]'.
>
> Any idea of why region can't connect?
>
> Thank you very much
> NOTICE
> Please consider the environment before printing this email. This messa= ge and any attachments are intended for the addressee named and may contain= legally privileged/confidential/copyright information. If you are not the = intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have re= ceived this message in error please notify us at once by return email and t= hen delete both messages. We accept no liability for the distribution of vi= ruses or similar in electronic communications. This notice should not be removed.
>

NOTICE
Please consider the enviro= nment before printing this email. This message and any attachments are inte= nded for the addressee named and may contain legally privileged/confidentia= l/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or di= stribute this communication. If you have received this message in error ple= ase notify us at once by return email and then delete both messages. We acc= ept no liability for the distribution of viruses or similar in electronic communications. This notice should not= be removed.
--_000_42a73d223b414e5f93ddfc4b4a2f3641garvanorgau_-- --_004_42a73d223b414e5f93ddfc4b4a2f3641garvanorgau_--