accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ott, Charles H." <CHARLES.H....@saic.com>
Subject RE: Trying to add tablet servers to accumulo 1.4 cluster
Date Thu, 23 May 2013 20:25:04 GMT
Looks like it is binding to :::9999 like you said.  Is that all
interfaces?  Shouldn't all tablet servers be able to bind to the master
if 1 or 2 can?  One of the tablet servers is running on the same node
the master is running on.  That is the tablet that seems to be hosting
all the data, and was the first tablet I setup.

 

[root@1620-accumulo ~]# ps -ef | grep master

 

hdfs      5878     1  0 15:13 ?        00:00:17
/usr/java/jdk1.6.0_38/bin/java -Dapp=master -classpath
/opt/accumulo/accumulo-current/conf:/opt/accumulo/accumulo-current/lib/a
ccumulo-start-1.4.3.jar:/opt/accumulo/accumulo-current/lib/commons-jci-c
ore-1.0.jar:/opt/accumulo/accumulo-current/lib/commons-jci-fam-1.0.jar:/
opt/accumulo/accumulo-current/lib/log4j-1.2.16.jar:/opt/accumulo/accumul
o-current/lib/commons-logging-1.0.4.jar:/opt/accumulo/accumulo-current/l
ib/commons-logging-api-1.0.4.jar -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -Xmx128m -Xms128m
-XX:OnOutOfMemoryError=kill -9 %p
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.int
ernal.jaxp.DocumentBuilderFactoryImpl
-Djava.library.path=/opt/hadoop/hadoop-0.20/lib/native/Linux-amd64-64
-Dorg.apache.accumulo.core.home.dir=/opt/accumulo/accumulo-current
-Dhadoop.home.dir=/opt/hadoop/hadoop-0.20
-Dzookeeper.home.dir=/opt/zookeeper/zookeeper-current
org.apache.accumulo.start.Main master --address 1620-accumulo

 

[root@1620-accumulo ~]# netstat -taupen | grep 5878

tcp        0      0 :::9999                     :::*
LISTEN      496        5229186    5878/java

tcp        0      0 ::ffff:10.35.58.81:39686    ::ffff:10.35.58.81:4560
ESTABLISHED 496        5229684    5878/java

tcp        0      0 ::ffff:10.35.58.81:38327    ::ffff:10.35.56.91:11224
ESTABLISHED 496        5229264    5878/java

tcp        0      0 ::ffff:10.35.58.81:50964    ::ffff:10.35.56.92:9997
ESTABLISHED 496        5229636    5878/java

tcp        0      0 ::ffff:10.35.58.81:50954    ::ffff:10.35.56.92:9997
ESTABLISHED 496        5229594    5878/java

tcp        0      0 ::ffff:10.35.58.81:56097    ::ffff:10.35.56.92:11224
ESTABLISHED 496        5229177    5878/java

tcp        0      0 ::ffff:10.35.58.81:51801    ::ffff:10.35.56.93:11224
ESTABLISHED 496        5229326    5878/java

tcp        0      0 ::ffff:10.35.58.81:38024    ::ffff:10.35.58.81:8020
ESTABLISHED 496        5437808    5878/java

tcp        0      0 ::ffff:10.35.58.81:48057    ::ffff:10.35.58.81:11224
ESTABLISHED 496        5229169    5878/java

tcp        0      0 ::ffff:10.35.58.81:44168    ::ffff:10.35.58.81:9997
ESTABLISHED 496        5229178    5878/java

tcp        0      0 ::ffff:10.35.58.81:44175    ::ffff:10.35.58.81:9997
ESTABLISHED 496        5229195    5878/java

tcp        0      0 ::ffff:10.35.58.81:58099    ::ffff:10.35.58.81:2181
ESTABLISHED 496        5229156    5878/java

 

[root@1620-accumulo ~]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4
localhost4.localdomain4

::1         localhost localhost.localdomain localhost6
localhost6.localdomain6

 

 

From: John Vines [mailto:vines@apache.org] 
Sent: Thursday, May 23, 2013 4:02 PM
To: Ott, Charles H.
Cc: user@accumulo.apache.org
Subject: Re: Trying to add tablet servers to accumulo 1.4 cluster

 

Is the node the master is running from accessible from the new nodes?
Furthermore, on the master node, do a netstat -nape to see what port the
master is bound on :::9999. If it's bound to localhost:9999 then it may
not be accessible from the other nodes regardless.

 

On Thu, May 23, 2013 at 3:50 PM, Ott, Charles H.
<CHARLES.H.OTT@saic.com> wrote:

My Accumulo-site zookeeper location is a DNS entry that resolves the IP
where zookeeper is installed.  I can ping the server using the server
name as well.

 

 

 

From: user-return-2587-CHARLES.H.OTT=saic.com@accumulo.apache.org
[mailto:user-return-2587-CHARLES.H.OTT=saic.com@accumulo.apache.org] On
Behalf Of John Vines
Sent: Thursday, May 23, 2013 3:39 PM
To: user@accumulo.apache.org
Subject: Re: Trying to add tablet servers to accumulo 1.4 cluster

 

In your accumulo-site, are you defining the zookeeper location as
localhost or a defined IP? Is that IP Accessible?

 

If you need to change it, I will preface this with you need to bring
down your existing cluster before you change the file, as then you will
get an error with the servers talking to one another.

 

On Thu, May 23, 2013 at 3:37 PM, Ott, Charles H.
<CHARLES.H.OTT@saic.com> wrote:


        I setup Accumulo 1.4.3 with a single hdfs data node and tablet
server.  Added a bit of data to it and once my additional hardware
resources were free'd up I am now trying to add 3 additional tablet
servers.  I already setup 3 hdfs datanodes, so I wanted to just run the
tserver processes on the same 3 servers:

Node1, Node2, Node3


I keep seeing this error with one or two nodes:

Uncaught exception in TabletServer.main, exiting
        java.lang.RuntimeException: java.lang.RuntimeException: Too many
retries, exiting.
                at
org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
abletServer.java:2684)
                at
org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.ja
va:2703)
                at
org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.j
ava:3168)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.accumulo.start.Main$1.run(Main.java:89)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.lang.RuntimeException: Too many retries,
exiting.
                at
org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
abletServer.java:2681)
                ... 8 more


But not sure what it means.  I use the command ./stop-here.sh and then
./start-here.sh on the tablet server in question, but it still does the
same thing.  What is weird, is when I do stop-all/start-all from the
master, at most I have seen 2 tablets up, but I can't seem to get all 3
up at once.

 The only locations I know the tserver processes are writing data to is:
/var/lib/accumulo/walogs & /opt/accumulo/accumulo-current/logs

Not sure what I am doing wrong here.

 

 


Mime
View raw message