accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ott, Charles H." <CHARLES.H....@saic.com>
Subject RE: Trying to add tablet servers to accumulo 1.4 cluster
Date Thu, 30 May 2013 14:14:36 GMT
Looks like disabling IPv6 did the trick.  All of my tablet servers are now up and running.
  FYI, this is how I disabled ipv6 in CentOS 6.3.  RHEL should be similar.

 

#for persistence

add the following to /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6 = 1

 

#during runtime

sysctl -w net.ipv6.conf.all.disable_ipv6=1

sysctl -w net.ipv6.conf.default.disable_ipv6=1

 

I did a # service network restart, just for kicks.

 

All tablets are up now.

 

From: user-return-2592-CHARLES.H.OTT=saic.com@accumulo.apache.org [mailto:user-return-2592-CHARLES.H.OTT=saic.com@accumulo.apache.org]
On Behalf Of Sean Busbey
Sent: Thursday, May 23, 2013 9:39 PM
To: user@accumulo.apache.org
Cc: vines@apache.org
Subject: Re: Trying to add tablet servers to accumulo 1.4 cluster

 

It looks like you have IPv6 on. Many parts of Hadoop behave poorly in the presence of IPv6[1],
and Accumulo has known issues on top of that[2].

 

You should verify IPv6 is active and then disable it.

 

Are your configuration files using fully qualified names in both the masters and slaves conf
files? can both the master and the new nodes do consistent forward and reverse DNS lookups
of each other? in particular, make sure the reverse matches the name used in the conf files.

 

are your accumulo/conf directories synced across all nodes?

 

[1]: http://wiki.apache.org/hadoop/HadoopIPv6

[2]: https://issues.apache.org/jira/browse/ACCUMULO-547

 

On Thu, May 23, 2013 at 4:25 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com> wrote:

Looks like it is binding to :::9999 like you said.  Is that all interfaces?  Shouldn’t all
tablet servers be able to bind to the master if 1 or 2 can?  One of the tablet servers is
running on the same node the master is running on.  That is the tablet that seems to be hosting
all the data, and was the first tablet I setup.

 

[root@1620-accumulo ~]# ps -ef | grep master

 

hdfs      5878     1  0 15:13 ?        00:00:17 /usr/java/jdk1.6.0_38/bin/java -Dapp=master
-classpath /opt/accumulo/accumulo-current/conf:/opt/accumulo/accumulo-current/lib/accumulo-start-1.4.3.jar:/opt/accumulo/accumulo-current/lib/commons-jci-core-1.0.jar:/opt/accumulo/accumulo-current/lib/commons-jci-fam-1.0.jar:/opt/accumulo/accumulo-current/lib/log4j-1.2.16.jar:/opt/accumulo/accumulo-current/lib/commons-logging-1.0.4.jar:/opt/accumulo/accumulo-current/lib/commons-logging-api-1.0.4.jar
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -Xmx128m -Xms128m -XX:OnOutOfMemoryError=kill
-9 %p -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
-Djava.library.path=/opt/hadoop/hadoop-0.20/lib/native/Linux-amd64-64 -Dorg.apache.accumulo.core.home.dir=/opt/accumulo/accumulo-current
-Dhadoop.home.dir=/opt/hadoop/hadoop-0.20 -Dzookeeper.home.dir=/opt/zookeeper/zookeeper-current
org.apache.accumulo.start.Main master --address 1620-accumulo

 

[root@1620-accumulo ~]# netstat -taupen | grep 5878

tcp        0      0 :::9999                     :::*                        LISTEN      496
       5229186    5878/java

tcp        0      0 ::ffff:10.35.58.81:39686    ::ffff:10.35.58.81:4560     ESTABLISHED 496
       5229684    5878/java

tcp        0      0 ::ffff:10.35.58.81:38327    ::ffff:10.35.56.91:11224    ESTABLISHED 496
       5229264    5878/java

tcp        0      0 ::ffff:10.35.58.81:50964    ::ffff:10.35.56.92:9997     ESTABLISHED 496
       5229636    5878/java

tcp        0      0 ::ffff:10.35.58.81:50954    ::ffff:10.35.56.92:9997     ESTABLISHED 496
       5229594    5878/java

tcp        0      0 ::ffff:10.35.58.81:56097    ::ffff:10.35.56.92:11224    ESTABLISHED 496
       5229177    5878/java

tcp        0      0 ::ffff:10.35.58.81:51801    ::ffff:10.35.56.93:11224    ESTABLISHED 496
       5229326    5878/java

tcp        0      0 ::ffff:10.35.58.81:38024    ::ffff:10.35.58.81:8020     ESTABLISHED 496
       5437808    5878/java

tcp        0      0 ::ffff:10.35.58.81:48057    ::ffff:10.35.58.81:11224    ESTABLISHED 496
       5229169    5878/java

tcp        0      0 ::ffff:10.35.58.81:44168    ::ffff:10.35.58.81:9997     ESTABLISHED 496
       5229178    5878/java

tcp        0      0 ::ffff:10.35.58.81:44175    ::ffff:10.35.58.81:9997     ESTABLISHED 496
       5229195    5878/java

tcp        0      0 ::ffff:10.35.58.81:58099    ::ffff:10.35.58.81:2181     ESTABLISHED 496
       5229156    5878/java

 

[root@1620-accumulo ~]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

 

 

From: John Vines [mailto:vines@apache.org] 
Sent: Thursday, May 23, 2013 4:02 PM
To: Ott, Charles H.
Cc: user@accumulo.apache.org


Subject: Re: Trying to add tablet servers to accumulo 1.4 cluster

 

Is the node the master is running from accessible from the new nodes? Furthermore, on the
master node, do a netstat -nape to see what port the master is bound on :::9999. If it's bound
to localhost:9999 then it may not be accessible from the other nodes regardless.

 

On Thu, May 23, 2013 at 3:50 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com> wrote:

My Accumulo-site zookeeper location is a DNS entry that resolves the IP where zookeeper is
installed.  I can ping the server using the server name as well.

 

 

 

From: user-return-2587-CHARLES.H.OTT=saic.com@accumulo.apache.org [mailto:user-return-2587-CHARLES.H.OTT=saic.com@accumulo.apache.org]
On Behalf Of John Vines
Sent: Thursday, May 23, 2013 3:39 PM
To: user@accumulo.apache.org
Subject: Re: Trying to add tablet servers to accumulo 1.4 cluster

 

In your accumulo-site, are you defining the zookeeper location as localhost or a defined IP?
Is that IP Accessible?

 

If you need to change it, I will preface this with you need to bring down your existing cluster
before you change the file, as then you will get an error with the servers talking to one
another.

 

On Thu, May 23, 2013 at 3:37 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com> wrote:


        I setup Accumulo 1.4.3 with a single hdfs data node and tablet
server.  Added a bit of data to it and once my additional hardware
resources were free'd up I am now trying to add 3 additional tablet
servers.  I already setup 3 hdfs datanodes, so I wanted to just run the
tserver processes on the same 3 servers:

Node1, Node2, Node3


I keep seeing this error with one or two nodes:

Uncaught exception in TabletServer.main, exiting
        java.lang.RuntimeException: java.lang.RuntimeException: Too many
retries, exiting.
                at
org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
abletServer.java:2684)
                at
org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.ja
va:2703)
                at
org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.j
ava:3168)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.accumulo.start.Main$1.run(Main.java:89)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.RuntimeException: Too many retries,
exiting.
                at
org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
abletServer.java:2681)
                ... 8 more


But not sure what it means.  I use the command ./stop-here.sh and then
./start-here.sh on the tablet server in question, but it still does the
same thing.  What is weird, is when I do stop-all/start-all from the
master, at most I have seen 2 tablets up, but I can't seem to get all 3
up at once.

 The only locations I know the tserver processes are writing data to is:
/var/lib/accumulo/walogs & /opt/accumulo/accumulo-current/logs

Not sure what I am doing wrong here.

 

 





 

-- 
Sean Busbey

Solutions Architect

Cloudera, Inc.

Phone: MAN-VS-BEARD

Mime
View raw message