accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: Trying to add tablet servers to accumulo 1.4 cluster
Date Fri, 24 May 2013 01:39:04 GMT
It looks like you have IPv6 on. Many parts of Hadoop behave poorly in the
presence of IPv6[1], and Accumulo has known issues on top of that[2].

You should verify IPv6 is active and then disable it.

Are your configuration files using fully qualified names in both the
masters and slaves conf files? can both the master and the new nodes do
consistent forward and reverse DNS lookups of each other? in particular,
make sure the reverse matches the name used in the conf files.

are your accumulo/conf directories synced across all nodes?

[1]: http://wiki.apache.org/hadoop/HadoopIPv6
[2]: https://issues.apache.org/jira/browse/ACCUMULO-547


On Thu, May 23, 2013 at 4:25 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com>wrote:

> Looks like it is binding to :::9999 like you said.  Is that all
> interfaces?  Shouldn’t all tablet servers be able to bind to the master if
> 1 or 2 can?  One of the tablet servers is running on the same node the
> master is running on.  That is the tablet that seems to be hosting all the
> data, and was the first tablet I setup.****
>
> ** **
>
> [root@1620-accumulo ~]# ps -ef | grep master****
>
> ** **
>
> hdfs      *5878*     1  0 15:13 ?        00:00:17
> /usr/java/jdk1.6.0_38/bin/java -Dapp=master -classpath
> /opt/accumulo/accumulo-current/conf:/opt/accumulo/accumulo-current/lib/accumulo-start-1.4.3.jar:/opt/accumulo/accumulo-current/lib/commons-jci-core-1.0.jar:/opt/accumulo/accumulo-current/lib/commons-jci-fam-1.0.jar:/opt/accumulo/accumulo-current/lib/log4j-1.2.16.jar:/opt/accumulo/accumulo-current/lib/commons-logging-1.0.4.jar:/opt/accumulo/accumulo-current/lib/commons-logging-api-1.0.4.jar
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -Xmx128m
> -Xms128m -XX:OnOutOfMemoryError=kill -9 %p
> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
> -Djava.library.path=/opt/hadoop/hadoop-0.20/lib/native/Linux-amd64-64
> -Dorg.apache.accumulo.core.home.dir=/opt/accumulo/accumulo-current
> -Dhadoop.home.dir=/opt/hadoop/hadoop-0.20
> -Dzookeeper.home.dir=/opt/zookeeper/zookeeper-current
> org.apache.accumulo.start.Main master --address 1620-accumulo****
>
> ** **
>
> [root@1620-accumulo ~]# netstat -taupen | grep *5878*****
>
> tcp        0      0 :::9999
>              :::*                        LISTEN      496        5229186
> 5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:39686    ::ffff:10.35.58.81:4560
> ESTABLISHED 496        5229684    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:38327    ::ffff:10.35.56.91:11224
> ESTABLISHED 496        5229264    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:50964    ::ffff:10.35.56.92:9997
> ESTABLISHED 496        5229636    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:50954    ::ffff:10.35.56.92:9997
> ESTABLISHED 496        5229594    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:56097    ::ffff:10.35.56.92:11224
> ESTABLISHED 496        5229177    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:51801    ::ffff:10.35.56.93:11224
> ESTABLISHED 496        5229326    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:38024    ::ffff:10.35.58.81:8020
> ESTABLISHED 496        5437808    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:48057    ::ffff:10.35.58.81:11224
> ESTABLISHED 496        5229169    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:44168    ::ffff:10.35.58.81:9997
> ESTABLISHED 496        5229178    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:44175    ::ffff:10.35.58.81:9997
> ESTABLISHED 496        5229195    5878/java****
>
> tcp        0      0 ::ffff:10.35.58.81:58099    ::ffff:10.35.58.81:2181
> ESTABLISHED 496        5229156    5878/java****
>
> ** **
>
> [root@1620-accumulo ~]# *cat /etc/hosts*****
>
> 127.0.0.1   localhost localhost.localdomain localhost4
> localhost4.localdomain4****
>
> ::1         localhost localhost.localdomain localhost6
> localhost6.localdomain6****
>
> ** **
>
> ** **
>
> *From:* John Vines [mailto:vines@apache.org]
> *Sent:* Thursday, May 23, 2013 4:02 PM
> *To:* Ott, Charles H.
> *Cc:* user@accumulo.apache.org
>
> *Subject:* Re: Trying to add tablet servers to accumulo 1.4 cluster****
>
> ** **
>
> Is the node the master is running from accessible from the new nodes?
> Furthermore, on the master node, do a netstat -nape to see what port the
> master is bound on :::9999. If it's bound to localhost:9999 then it may not
> be accessible from the other nodes regardless.****
>
> ** **
>
> On Thu, May 23, 2013 at 3:50 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com>
> wrote:****
>
> My Accumulo-site zookeeper location is a DNS entry that resolves the IP
> where zookeeper is installed.  I can ping the server using the server name
> as well.****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* user-return-2587-CHARLES.H.OTT=saic.com@accumulo.apache.org[mailto:
> user-return-2587-CHARLES.H.OTT=saic.com@accumulo.apache.org] *On Behalf
> Of *John Vines
> *Sent:* Thursday, May 23, 2013 3:39 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: Trying to add tablet servers to accumulo 1.4 cluster****
>
>  ****
>
> In your accumulo-site, are you defining the zookeeper location as
> localhost or a defined IP? Is that IP Accessible?****
>
>  ****
>
> If you need to change it, I will preface this with you need to bring down
> your existing cluster before you change the file, as then you will get an
> error with the servers talking to one another.****
>
>  ****
>
> On Thu, May 23, 2013 at 3:37 PM, Ott, Charles H. <CHARLES.H.OTT@saic.com>
> wrote:****
>
>
>         I setup Accumulo 1.4.3 with a single hdfs data node and tablet
> server.  Added a bit of data to it and once my additional hardware
> resources were free'd up I am now trying to add 3 additional tablet
> servers.  I already setup 3 hdfs datanodes, so I wanted to just run the
> tserver processes on the same 3 servers:
>
> Node1, Node2, Node3
>
>
> I keep seeing this error with one or two nodes:
>
> Uncaught exception in TabletServer.main, exiting
>         java.lang.RuntimeException: java.lang.RuntimeException: Too many
> retries, exiting.
>                 at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
> abletServer.java:2684)
>                 at
> org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.ja
> va:2703)
>                 at
> org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.j
> ava:3168)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
> a:39)
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
>                 at java.lang.reflect.Method.invoke(Method.java:597)
>                 at org.apache.accumulo.start.Main$1.run(Main.java:89)
>                 at java.lang.Thread.run(Thread.java:662)
>         Caused by: java.lang.RuntimeException: Too many retries,
> exiting.
>                 at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
> abletServer.java:2681)
>                 ... 8 more
>
>
> But not sure what it means.  I use the command ./stop-here.sh and then
> ./start-here.sh on the tablet server in question, but it still does the
> same thing.  What is weird, is when I do stop-all/start-all from the
> master, at most I have seen 2 tablets up, but I can't seem to get all 3
> up at once.
>
>  The only locations I know the tserver processes are writing data to is:
> /var/lib/accumulo/walogs & /opt/accumulo/accumulo-current/logs
>
> Not sure what I am doing wrong here.****
>
>  ****
>
> ** **
>



-- 
Sean Busbey
Solutions Architect
Cloudera, Inc.
Phone: MAN-VS-BEARD

Mime
View raw message