accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <threadedb...@gmail.com>
Subject Re: Remotely Accumulo
Date Thu, 09 Oct 2014 19:25:35 GMT
So start-here.sh does it. Thanks for pointing that out.  I was looking all
through the shell commands .


I did try, from the master, start-all.sh and it worked for starting the
tserver, but I noticed that on the master, it increased the number of
processes labeled "Main" from the usual five to seven.


>From accumulo-site.xml, everything memory related:


  <property>

    <name>tserver.memory.maps.max</name>

    <value>256M</value>

  </property>

  <property>

    <name>tserver.memory.maps.native.enabled</name>

    <value>false</value>

  </property>

  <property>

    <name>tserver.cache.data.size</name>

    <value>50M</value>

  </property>

  <property>

    <name>tserver.cache.index.size</name>

    <value>100M</value>

  </property>

  <property>

    <name>tserver.walog.max.size</name>

    <value>512M</value>

  </property>

On Thu, Oct 9, 2014 at 10:54 AM, Josh Elser <josh.elser@gmail.com> wrote:

> You can use start-here.sh on the host in question or `start-server.sh
> $hostname tserver`. FWIW, re-invoking start-all should just ignored the
> hosts which already have processes running and just start a tserver on the
> host that died.
>
> 2G should be enough to get a connector and read a table. TBH, 256M should
> be enough for that.
>
> Also, the JVM OOME doesn't include timestamps, there's isn't much more to
> glean from that message other than "it died because it ran out of heap".
>
> What does your accumulo-site.xml look like?
>
> Geoffry Roberts wrote:
>
>> I found the message in tserver*.out. tserver*.err has 0 in it.
>>
>> I posted last night, life was good, sat down this morning and saw that
>> another tserver had crashed, over night, with no activity.  ??  In
>> tserver*.out it again says out of heap space.
>>
>> ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.
>>
>> The fact that the log entries lack timestamps, but have hashmarks makes
>> makes me wonder if I am reading things correctly.
>>
>> #
>>
>> # java.lang.OutOfMemoryError: Java heap space
>>
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>
>> #   Executing /bin/sh -c "kill -9 3241"...
>>
>>
>> Is there a way to start a particular tablet server?
>>
>>
>> On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <eric.newton@gmail.com
>> <mailto:eric.newton@gmail.com>> wrote:
>>
>>     Did you find the message in the tserver*.out, terver*.err or the
>>     monitor page?
>>
>>     (Thanks for the follow-up message.)
>>
>>     On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts
>>     <threadedblue@gmail.com <mailto:threadedblue@gmail.com>> wrote:
>>
>>         Just for the record, I finally got to the bottom of things.  One
>>         of my Tservers was running out of memory.  I hadn't noticed.  I
>>         had my SA allocate a lttle more--each node now has 6G up from
>>         2G--and things are working better.
>>
>>         On Oct 8, 2014 10:09 AM, "Josh Elser" <josh.elser@gmail.com
>>         <mailto:josh.elser@gmail.com>> wrote:
>>
>>             Jstack is a tool which can be used to tell a java process to
>>             dump the current stack traces for all of its threads. It's
>>             usually included with the JDK. `kill -3 $pid` also does the
>>             same. If the output can't be respected automatically to your
>>             shell, check the stdout for the process you gave as an
>>             argument.
>>
>>             When your client is sitting waiting on data from the
>>             tabletserver, you can get the stack traces from the tserver
>>             and you should be able to find a thread with scan in the
>>             name, along with your client's IP, and we can help debug
>>             exactly what the server is doing that is preventing it from
>>             returning data to your client.
>>
>>             On Oct 8, 2014 9:43 AM, "Geoffry Roberts"
>>             <threadedblue@gmail.com <mailto:threadedblue@gmail.com>>
>> wrote:
>>
>>                 Thanks Josh.  But what do you mean my "jstack'ing"?  I'm
>>                 unfamiliar with that term.  A better question would be
>>                 how can one troubleshoot such a thing?
>>
>>                 btw
>>                 I am the sole user on this cluster.
>>
>>                 On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser
>>                 <josh.elser@gmail.com <mailto:josh.elser@gmail.com>>
>> wrote:
>>
>>                     Ok, this record:
>>
>>                     tcp        0      0 0.0.0.0:9997
>>                     <http://0.0.0.0:9997>                0.0.0.0:*
>>                           LISTEN
>>
>>                     Means that your is listening on the correct port on
>>                     all interfaces.
>>                     There shouldn't be issues connecting to the tserver.
>>                     This is also
>>                     confirmed by the fact that you authenticated and got
>>                     a Connector (this
>>                     does an RPC to the tserver).
>>
>>                     So, your tserver is up, and your client can
>>                     communicate with it. The
>>                     real question is why is the scan hanging. Perhaps
>>                     jstack'ing the
>>                     tserver when your client is blocked waiting for
>> results.
>>
>>                     On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts
>>                     <threadedblue@gmail.com
>>                     <mailto:threadedblue@gmail.com>> wrote:
>>                      > "...it's when
>>                      > you make a Connector, and your client will talk
>>                     to a tabletserver to
>>                      > authenticate, that your program should hang. It
>>                     would be good to
>>                      > verify that."
>>                      >
>>                      >
>>                      > My program should hang?  Would you expand?  That
>>                     is exactly what it is
>>                      > doing.  I am able to get a connector.  But when I
>>                     try to iterate the result
>>                      > of a scan, that's when it hangs.
>>                      >
>>                      >
>>                      >
>>                      >
>>                      > Here's what comes from netstat:
>>                      >
>>                      >
>>                      > $ netstat -na | grep 9997
>>                      >
>>                      > tcp        0      0 0.0.0.0:9997
>>                     <http://0.0.0.0:9997>                0.0.0.0:*
>>                      > LISTEN
>>                      >
>>                      > tcp        0      0 204.9.140.36:35679
>>                     <http://204.9.140.36:35679> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53146
>>                     <http://204.9.140.36:53146> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33896
>>                     <http://204.9.140.36:33896> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53282
>>                     <http://204.9.140.36:53282> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53188
>>                     <http://204.9.140.36:53188> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35609
>>                     <http://204.9.140.36:35609> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33901
>>                     <http://204.9.140.36:33901> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35588
>>                     <http://204.9.140.36:35588> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33877
>>                     <http://204.9.140.36:33877> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33946
>>                     <http://204.9.140.36:33946> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53167
>>                     <http://204.9.140.36:53167> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33949
>>                     <http://204.9.140.36:33949> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:35546
>>                     <http://204.9.140.36:35546> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33852
>>                     <http://204.9.140.36:33852> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53125
>>                     <http://204.9.140.36:53125> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33922
>>                     <http://204.9.140.36:33922> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33747
>>                     <http://204.9.140.36:33747> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33961
>>                     <http://204.9.140.36:33961> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33793
>>                     <http://204.9.140.36:33793> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35768
>>                     <http://204.9.140.36:35768> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33917
>>                     <http://204.9.140.36:33917> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33814
>>                     <http://204.9.140.36:33814> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35567
>>                     <http://204.9.140.36:35567> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33444
>>                     <http://204.9.140.36:33444> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > FIN_WAIT2
>>                      >
>>                      > tcp        0      0 204.9.140.36:35701
>>                     <http://204.9.140.36:35701> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33969
>>                     <http://204.9.140.36:33969> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53258
>>                     <http://204.9.140.36:53258> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33831
>>                     <http://204.9.140.36:33831> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53210
>>                     <http://204.9.140.36:53210> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53104
>>                     <http://204.9.140.36:53104> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33789
>>                     <http://204.9.140.36:33789> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33856
>>                     <http://204.9.140.36:33856> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53237
>>                     <http://204.9.140.36:53237> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33835
>>                     <http://204.9.140.36:33835> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35651
>>                     <http://204.9.140.36:35651> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33938
>>                     <http://204.9.140.36:33938> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33041
>>                     <http://204.9.140.36:33041> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:53285
>>                     <http://204.9.140.36:53285> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53305
>>                     <http://204.9.140.36:53305> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33768
>>                     <http://204.9.140.36:33768> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35630
>>                     <http://204.9.140.36:35630> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33754
>>                     <http://204.9.140.36:33754> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35745
>>                     <http://204.9.140.36:35745> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35724
>>                     <http://204.9.140.36:35724> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:9997
>>                     <http://204.9.140.36:9997> 204.9.140.36:33041
>>                     <http://204.9.140.36:33041>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:53083
>>                     <http://204.9.140.36:53083> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:50623
>>                     <http://204.9.140.36:50623> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:33772
>>                     <http://204.9.140.36:33772> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33732
>>                     <http://204.9.140.36:33732> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33874
>>                     <http://204.9.140.36:33874> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33810
>>                     <http://204.9.140.36:33810> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      >
>>                      > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser
>>                     <josh.elser@gmail.com <mailto:josh.elser@gmail.com>>
>>
>>                     wrote:
>>                      >>
>>                      >> Can you provide the output from netstat, lsof or
>>                     /proc/$pid/fd for the
>>                      >> tserver? Assuming you haven't altered
>>                     tserv.port.client in
>>                      >> accumulo-site.xml, we want the line for port 9997.
>>                      >>
>>                      >> From my laptop running a tserver on localhost:
>>                      >>
>>                      >> $ netstat -na | grep 9997
>>                      >> tcp4       0      0  127.0.0.1.9997         *.*
>>                                        LISTEN
>>                      >>
>>                      >> Depending on the tool you use, you can grep out
>>                     the pid of the tserver
>>                      >> or just that port itself.
>>                      >>
>>                      >> Just so you know, ZK binds to all available
>>                     interfaces when it starts,
>>                      >> so it should work seamlessly with localhost or
>>                     the FQDN for the host.
>>                      >> As such, it shouldn't matter what you provide to
>> the
>>                      >> ZooKeeperInstance. That should connect in all
>>                     cases for you, it's when
>>                      >> you make a Connector, and your client will talk
>>                     to a tabletserver to
>>                      >> authenticate, that your program should hang. It
>>                     would be good to
>>                      >> verify that.
>>                      >>
>>                      >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts
>>                     <threadedblue@gmail.com <mailto:
>> threadedblue@gmail.com>>
>>                      >> wrote:
>>                      >> > All,
>>                      >> >
>>                      >> > Thanks for the responses.
>>                      >> >
>>                      >> > Is this a problem for Accumulo?
>>                      >> > Reverse DNS is yielding my ISP's host name.
>>                     You know the drill, my IP in
>>                      >> > reverse followed by their domain name, as
>>                     opposed to my FQDN, which what
>>                      >> > I
>>                      >> > use in my config files.
>>                      >> >
>>                      >> > Running Accumulo 1.5.1
>>                      >> > I have only one interface.
>>                      >> > I have the FQDN in both master and slaves
>>                     files for both Hadoop and
>>                      >> > Accumulo; in zoo.cfg; and in accumulo-site.xml
>>                     where the Zookeepers are
>>                      >> > referenced.
>>                      >> > Also, I am passing in all Zk FQDN when I
>>                     instantiate ZookeeperInstance.
>>                      >> > Forward DNS works
>>                      >> > Reverse DNS... well (See above).
>>                      >> >
>>                      >> >
>>                      >> >
>>                      >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs
>>                     <afuchs@apache.org <mailto:afuchs@apache.org>> wrote:
>>                      >> >>
>>                      >> >> Accumulo tservers typically listen on a
>>                     single interface. If you have a
>>                      >> >> server with multiple interfaces (e.g.
>>                     loopback and eth0), you might
>>                      >> >> have a
>>                      >> >> problem in which the tablet servers are not
>>                     listening on externally
>>                      >> >> reachable interfaces. Tablet servers will
>>                     list the interfaces that they
>>                      >> >> are
>>                      >> >> listening to when they boot, and you can also
>>                     use tools like lsof to
>>                      >> >> find
>>                      >> >> them.
>>                      >> >>
>>                      >> >> If that is indeed the problem, then you might
>>                     just need to change you
>>                      >> >> conf/slaves file to use <hostname> instead
of
>>                     localhost, and then
>>                      >> >> restart.
>>                      >> >>
>>                      >> >> Adam
>>                      >> >>
>>                      >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts"
>>                     <threadedblue@gmail.com <mailto:
>> threadedblue@gmail.com>>
>>
>>                      >> >> wrote:
>>                      >> >>>
>>                      >> >>>
>>                      >> >>> I have been happily working with Acc,
but
>>                     today things changed.  No
>>                      >> >>> errors
>>                      >> >>>
>>                      >> >>> Until now I ran everything server side,
>>                     which meant the URL was
>>                      >> >>> localhost:2181, and life was good.  Today
>>                     tried running some of the
>>                      >> >>> same
>>                      >> >>> code as a remote client, which means <host
>>                     name>:2181.  Things hang
>>                      >> >>> when
>>                      >> >>> BatchWriter tries to commit anything and
>>                     Scan hangs when it tries to
>>                      >> >>> iterate
>>                      >> >>> through a Map.
>>                      >> >>>
>>                      >> >>> Let's focus on the scan part:
>>                      >> >>>
>>                      >> >>> scan.fetchColumnFamily(new Text("colfY"));
>>                     // This executes then
>>                      >> >>> hangs.
>>                      >> >>> for(Entry<Key,Value> entry : scan)
{
>>                      >> >>> def row = entry.getKey().getRow();
>>                      >> >>> def value = entry.getValue();
>>                      >> >>> println "value=" + value;
>>                      >> >>> }
>>                      >> >>>
>>                      >> >>> This is what appears in the console :
>>                      >> >>>
>>                      >> >>> 17:22:39.802 C{0} M DEBUG
>>                     org.apache.zookeeper.ClientCnxn - Got ping
>>                      >> >>> response for sessionid: 0x148c6f03388005e
>>                     after 21ms
>>                      >> >>>
>>                      >> >>> 17:22:49.803 C{0} M DEBUG
>>                     org.apache.zookeeper.ClientCnxn - Got ping
>>                      >> >>> response for sessionid: 0x148c6f03388005e
>>                     after 21ms
>>                      >> >>>
>>                      >> >>> <and on and on>
>>                      >> >>>
>>                      >> >>>
>>                      >> >>>
>>                      >> >>> The only difference between success and
a
>>                     hang is a URL change, and of
>>                      >> >>> course being remote.
>>                      >> >>>
>>                      >> >>> I don't believe this is a firewall issue.
 I
>>                     shutdown the firewall.
>>                      >> >>>
>>                      >> >>> Am I missing something?
>>                      >> >>>
>>                      >> >>> Thanks all.
>>                      >> >>>
>>                      >> >>> --
>>                      >> >>> There are ways and there are ways,
>>                      >> >>>
>>                      >> >>> Geoffry Roberts
>>                      >> >
>>                      >> >
>>                      >> >
>>                      >> >
>>                      >> > --
>>                      >> > There are ways and there are ways,
>>                      >> >
>>                      >> > Geoffry Roberts
>>                      >
>>                      >
>>                      >
>>                      >
>>                      > --
>>                      > There are ways and there are ways,
>>                      >
>>                      > Geoffry Roberts
>>
>>
>>
>>
>>                 --
>>                 There are ways and there are ways,
>>
>>                 Geoffry Roberts
>>
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Mime
View raw message