accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Remotely Accumulo
Date Thu, 09 Oct 2014 14:54:19 GMT
You can use start-here.sh on the host in question or `start-server.sh 
$hostname tserver`. FWIW, re-invoking start-all should just ignored the 
hosts which already have processes running and just start a tserver on 
the host that died.

2G should be enough to get a connector and read a table. TBH, 256M 
should be enough for that.

Also, the JVM OOME doesn't include timestamps, there's isn't much more 
to glean from that message other than "it died because it ran out of heap".

What does your accumulo-site.xml look like?

Geoffry Roberts wrote:
> I found the message in tserver*.out. tserver*.err has 0 in it.
>
> I posted last night, life was good, sat down this morning and saw that
> another tserver had crashed, over night, with no activity.  ??  In
> tserver*.out it again says out of heap space.
>
> ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.
>
> The fact that the log entries lack timestamps, but have hashmarks makes
> makes me wonder if I am reading things correctly.
>
> #
>
> # java.lang.OutOfMemoryError: Java heap space
>
> # -XX:OnOutOfMemoryError="kill -9 %p"
>
> #   Executing /bin/sh -c "kill -9 3241"...
>
>
> Is there a way to start a particular tablet server?
>
>
> On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <eric.newton@gmail.com
> <mailto:eric.newton@gmail.com>> wrote:
>
>     Did you find the message in the tserver*.out, terver*.err or the
>     monitor page?
>
>     (Thanks for the follow-up message.)
>
>     On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts
>     <threadedblue@gmail.com <mailto:threadedblue@gmail.com>> wrote:
>
>         Just for the record, I finally got to the bottom of things.  One
>         of my Tservers was running out of memory.  I hadn't noticed.  I
>         had my SA allocate a lttle more--each node now has 6G up from
>         2G--and things are working better.
>
>         On Oct 8, 2014 10:09 AM, "Josh Elser" <josh.elser@gmail.com
>         <mailto:josh.elser@gmail.com>> wrote:
>
>             Jstack is a tool which can be used to tell a java process to
>             dump the current stack traces for all of its threads. It's
>             usually included with the JDK. `kill -3 $pid` also does the
>             same. If the output can't be respected automatically to your
>             shell, check the stdout for the process you gave as an
>             argument.
>
>             When your client is sitting waiting on data from the
>             tabletserver, you can get the stack traces from the tserver
>             and you should be able to find a thread with scan in the
>             name, along with your client's IP, and we can help debug
>             exactly what the server is doing that is preventing it from
>             returning data to your client.
>
>             On Oct 8, 2014 9:43 AM, "Geoffry Roberts"
>             <threadedblue@gmail.com <mailto:threadedblue@gmail.com>> wrote:
>
>                 Thanks Josh.  But what do you mean my "jstack'ing"?  I'm
>                 unfamiliar with that term.  A better question would be
>                 how can one troubleshoot such a thing?
>
>                 btw
>                 I am the sole user on this cluster.
>
>                 On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser
>                 <josh.elser@gmail.com <mailto:josh.elser@gmail.com>> wrote:
>
>                     Ok, this record:
>
>                     tcp        0      0 0.0.0.0:9997
>                     <http://0.0.0.0:9997>                0.0.0.0:*
>                           LISTEN
>
>                     Means that your is listening on the correct port on
>                     all interfaces.
>                     There shouldn't be issues connecting to the tserver.
>                     This is also
>                     confirmed by the fact that you authenticated and got
>                     a Connector (this
>                     does an RPC to the tserver).
>
>                     So, your tserver is up, and your client can
>                     communicate with it. The
>                     real question is why is the scan hanging. Perhaps
>                     jstack'ing the
>                     tserver when your client is blocked waiting for results.
>
>                     On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts
>                     <threadedblue@gmail.com
>                     <mailto:threadedblue@gmail.com>> wrote:
>                      > "...it's when
>                      > you make a Connector, and your client will talk
>                     to a tabletserver to
>                      > authenticate, that your program should hang. It
>                     would be good to
>                      > verify that."
>                      >
>                      >
>                      > My program should hang?  Would you expand?  That
>                     is exactly what it is
>                      > doing.  I am able to get a connector.  But when I
>                     try to iterate the result
>                      > of a scan, that's when it hangs.
>                      >
>                      >
>                      >
>                      >
>                      > Here's what comes from netstat:
>                      >
>                      >
>                      > $ netstat -na | grep 9997
>                      >
>                      > tcp        0      0 0.0.0.0:9997
>                     <http://0.0.0.0:9997>                0.0.0.0:*
>                      > LISTEN
>                      >
>                      > tcp        0      0 204.9.140.36:35679
>                     <http://204.9.140.36:35679> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53146
>                     <http://204.9.140.36:53146> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33896
>                     <http://204.9.140.36:33896> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53282
>                     <http://204.9.140.36:53282> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53188
>                     <http://204.9.140.36:53188> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35609
>                     <http://204.9.140.36:35609> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33901
>                     <http://204.9.140.36:33901> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35588
>                     <http://204.9.140.36:35588> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33877
>                     <http://204.9.140.36:33877> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33946
>                     <http://204.9.140.36:33946> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53167
>                     <http://204.9.140.36:53167> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33949
>                     <http://204.9.140.36:33949> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:35546
>                     <http://204.9.140.36:35546> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33852
>                     <http://204.9.140.36:33852> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53125
>                     <http://204.9.140.36:53125> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33922
>                     <http://204.9.140.36:33922> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33747
>                     <http://204.9.140.36:33747> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33961
>                     <http://204.9.140.36:33961> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33793
>                     <http://204.9.140.36:33793> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35768
>                     <http://204.9.140.36:35768> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33917
>                     <http://204.9.140.36:33917> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33814
>                     <http://204.9.140.36:33814> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35567
>                     <http://204.9.140.36:35567> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33444
>                     <http://204.9.140.36:33444> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > FIN_WAIT2
>                      >
>                      > tcp        0      0 204.9.140.36:35701
>                     <http://204.9.140.36:35701> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33969
>                     <http://204.9.140.36:33969> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53258
>                     <http://204.9.140.36:53258> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33831
>                     <http://204.9.140.36:33831> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53210
>                     <http://204.9.140.36:53210> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53104
>                     <http://204.9.140.36:53104> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33789
>                     <http://204.9.140.36:33789> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33856
>                     <http://204.9.140.36:33856> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53237
>                     <http://204.9.140.36:53237> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33835
>                     <http://204.9.140.36:33835> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35651
>                     <http://204.9.140.36:35651> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33938
>                     <http://204.9.140.36:33938> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33041
>                     <http://204.9.140.36:33041> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:53285
>                     <http://204.9.140.36:53285> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53305
>                     <http://204.9.140.36:53305> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33768
>                     <http://204.9.140.36:33768> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35630
>                     <http://204.9.140.36:35630> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33754
>                     <http://204.9.140.36:33754> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35745
>                     <http://204.9.140.36:35745> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35724
>                     <http://204.9.140.36:35724> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:9997
>                     <http://204.9.140.36:9997> 204.9.140.36:33041
>                     <http://204.9.140.36:33041>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:53083
>                     <http://204.9.140.36:53083> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:50623
>                     <http://204.9.140.36:50623> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:33772
>                     <http://204.9.140.36:33772> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33732
>                     <http://204.9.140.36:33732> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33874
>                     <http://204.9.140.36:33874> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33810
>                     <http://204.9.140.36:33810> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      >
>                      > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser
>                     <josh.elser@gmail.com <mailto:josh.elser@gmail.com>>
>                     wrote:
>                      >>
>                      >> Can you provide the output from netstat, lsof or
>                     /proc/$pid/fd for the
>                      >> tserver? Assuming you haven't altered
>                     tserv.port.client in
>                      >> accumulo-site.xml, we want the line for port 9997.
>                      >>
>                      >> From my laptop running a tserver on localhost:
>                      >>
>                      >> $ netstat -na | grep 9997
>                      >> tcp4       0      0  127.0.0.1.9997         *.*
>                                        LISTEN
>                      >>
>                      >> Depending on the tool you use, you can grep out
>                     the pid of the tserver
>                      >> or just that port itself.
>                      >>
>                      >> Just so you know, ZK binds to all available
>                     interfaces when it starts,
>                      >> so it should work seamlessly with localhost or
>                     the FQDN for the host.
>                      >> As such, it shouldn't matter what you provide to the
>                      >> ZooKeeperInstance. That should connect in all
>                     cases for you, it's when
>                      >> you make a Connector, and your client will talk
>                     to a tabletserver to
>                      >> authenticate, that your program should hang. It
>                     would be good to
>                      >> verify that.
>                      >>
>                      >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts
>                     <threadedblue@gmail.com <mailto:threadedblue@gmail.com>>
>                      >> wrote:
>                      >> > All,
>                      >> >
>                      >> > Thanks for the responses.
>                      >> >
>                      >> > Is this a problem for Accumulo?
>                      >> > Reverse DNS is yielding my ISP's host name.
>                     You know the drill, my IP in
>                      >> > reverse followed by their domain name, as
>                     opposed to my FQDN, which what
>                      >> > I
>                      >> > use in my config files.
>                      >> >
>                      >> > Running Accumulo 1.5.1
>                      >> > I have only one interface.
>                      >> > I have the FQDN in both master and slaves
>                     files for both Hadoop and
>                      >> > Accumulo; in zoo.cfg; and in accumulo-site.xml
>                     where the Zookeepers are
>                      >> > referenced.
>                      >> > Also, I am passing in all Zk FQDN when I
>                     instantiate ZookeeperInstance.
>                      >> > Forward DNS works
>                      >> > Reverse DNS... well (See above).
>                      >> >
>                      >> >
>                      >> >
>                      >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs
>                     <afuchs@apache.org <mailto:afuchs@apache.org>> wrote:
>                      >> >>
>                      >> >> Accumulo tservers typically listen on a
>                     single interface. If you have a
>                      >> >> server with multiple interfaces (e.g.
>                     loopback and eth0), you might
>                      >> >> have a
>                      >> >> problem in which the tablet servers are not
>                     listening on externally
>                      >> >> reachable interfaces. Tablet servers will
>                     list the interfaces that they
>                      >> >> are
>                      >> >> listening to when they boot, and you can also
>                     use tools like lsof to
>                      >> >> find
>                      >> >> them.
>                      >> >>
>                      >> >> If that is indeed the problem, then you might
>                     just need to change you
>                      >> >> conf/slaves file to use <hostname> instead
of
>                     localhost, and then
>                      >> >> restart.
>                      >> >>
>                      >> >> Adam
>                      >> >>
>                      >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts"
>                     <threadedblue@gmail.com <mailto:threadedblue@gmail.com>>
>                      >> >> wrote:
>                      >> >>>
>                      >> >>>
>                      >> >>> I have been happily working with Acc, but
>                     today things changed.  No
>                      >> >>> errors
>                      >> >>>
>                      >> >>> Until now I ran everything server side,
>                     which meant the URL was
>                      >> >>> localhost:2181, and life was good.  Today
>                     tried running some of the
>                      >> >>> same
>                      >> >>> code as a remote client, which means <host
>                     name>:2181.  Things hang
>                      >> >>> when
>                      >> >>> BatchWriter tries to commit anything and
>                     Scan hangs when it tries to
>                      >> >>> iterate
>                      >> >>> through a Map.
>                      >> >>>
>                      >> >>> Let's focus on the scan part:
>                      >> >>>
>                      >> >>> scan.fetchColumnFamily(new Text("colfY"));
>                     // This executes then
>                      >> >>> hangs.
>                      >> >>> for(Entry<Key,Value> entry : scan) {
>                      >> >>> def row = entry.getKey().getRow();
>                      >> >>> def value = entry.getValue();
>                      >> >>> println "value=" + value;
>                      >> >>> }
>                      >> >>>
>                      >> >>> This is what appears in the console :
>                      >> >>>
>                      >> >>> 17:22:39.802 C{0} M DEBUG
>                     org.apache.zookeeper.ClientCnxn - Got ping
>                      >> >>> response for sessionid: 0x148c6f03388005e
>                     after 21ms
>                      >> >>>
>                      >> >>> 17:22:49.803 C{0} M DEBUG
>                     org.apache.zookeeper.ClientCnxn - Got ping
>                      >> >>> response for sessionid: 0x148c6f03388005e
>                     after 21ms
>                      >> >>>
>                      >> >>> <and on and on>
>                      >> >>>
>                      >> >>>
>                      >> >>>
>                      >> >>> The only difference between success and a
>                     hang is a URL change, and of
>                      >> >>> course being remote.
>                      >> >>>
>                      >> >>> I don't believe this is a firewall issue.
 I
>                     shutdown the firewall.
>                      >> >>>
>                      >> >>> Am I missing something?
>                      >> >>>
>                      >> >>> Thanks all.
>                      >> >>>
>                      >> >>> --
>                      >> >>> There are ways and there are ways,
>                      >> >>>
>                      >> >>> Geoffry Roberts
>                      >> >
>                      >> >
>                      >> >
>                      >> >
>                      >> > --
>                      >> > There are ways and there are ways,
>                      >> >
>                      >> > Geoffry Roberts
>                      >
>                      >
>                      >
>                      >
>                      > --
>                      > There are ways and there are ways,
>                      >
>                      > Geoffry Roberts
>
>
>
>
>                 --
>                 There are ways and there are ways,
>
>                 Geoffry Roberts
>
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts

Mime
View raw message