accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Dead Tablet Server
Date Tue, 17 Sep 2013 19:20:19 GMT
On Tue, Sep 17, 2013 at 10:23 AM, Ott, Charles H. <CHARLES.H.OTT@saic.com>wrote:

> Forgive my ignorance with this, But I have not yet had a tablet failure
> that I have been able to recover without restarting the entire accumulo
> cluster.****
>
> ** **
>
> I have 3 Tablets, 2 Online, 1 dead.  Using Accumulo 1.4.3****
>
> ** **
>
> The tablet error reports:****
>
> Uncaught exception in TabletServer.main, exiting****
>
>          java.lang.RuntimeException: java.lang.RuntimeException: Too many
> retries, exiting.****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2684)
> ****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
> ****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
> ****
>
>                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)****
>
>                  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> ****
>
>                  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> ****
>
>                  at java.lang.reflect.Method.invoke(Method.java:597)****
>
>                  at org.apache.accumulo.start.Main$1.run(Main.java:89)****
>
>                  at java.lang.Thread.run(Thread.java:662)****
>
>          Caused by: java.lang.RuntimeException: Too many retries, exiting.
> ****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
> ****
>
>                  ... 8 more****
>
> ** **
>
> **
>

It would be nice to add this stack trace as a comment on ACCUMULO-1277 to
make it easier to find via google.  Would you like to do this?  If not I
can.


> **
>
> The recovery portion of the Admin guide says that recovery is performed by
> asking the loggers to copy their write-ahead logs into HDFS.  The logs are
> copied, sorted and then tablets can find missing updates.  Once complete
> the tablets involved should return to an ‘online’ state.****
>
> ** **
>
> I am not sure how to ask the loggers to copy their write-ahead logs into
> hdfs.  Is this the same as using the flush shell command?  If so, the flush
> command needs a pattern of tables or a table name.  Would I want to perform
> something like, ‘accumulo flush -p .+’ to flush all of the table data to
> HDFS?****
>
> ** **
>
> Another concern is that the Tablet Server process was no longer running on
> the server.  I logged into that server and ran “start-here.sh”.  The tablet
> server is now running, but it is still reported as ‘dead’ to the monitor.
> ****
>
> ** **
>
> Thanks in advance,****
>
> Charles****
>

Mime
View raw message