accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ott, Charles H." <CHARLES.H....@saic.com>
Subject Dead Tablet Server
Date Tue, 17 Sep 2013 14:23:53 GMT
Forgive my ignorance with this, But I have not yet had a tablet failure
that I have been able to recover without restarting the entire accumulo
cluster.

 

I have 3 Tablets, 2 Online, 1 dead.  Using Accumulo 1.4.3

 

The tablet error reports:

Uncaught exception in TabletServer.main, exiting

         java.lang.RuntimeException: java.lang.RuntimeException: Too
many retries, exiting.

                 at
org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
abletServer.java:2684)

                 at
org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.ja
va:2703)

                 at
org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.j
ava:3168)

                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)

                 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)

                 at java.lang.reflect.Method.invoke(Method.java:597)

                 at org.apache.accumulo.start.Main$1.run(Main.java:89)

                 at java.lang.Thread.run(Thread.java:662)

         Caused by: java.lang.RuntimeException: Too many retries,
exiting.

                 at
org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(T
abletServer.java:2681)

                 ... 8 more

 

 

The recovery portion of the Admin guide says that recovery is performed
by asking the loggers to copy their write-ahead logs into HDFS.  The
logs are copied, sorted and then tablets can find missing updates.  Once
complete the tablets involved should return to an 'online' state.

 

I am not sure how to ask the loggers to copy their write-ahead logs into
hdfs.  Is this the same as using the flush shell command?  If so, the
flush command needs a pattern of tables or a table name.  Would I want
to perform something like, 'accumulo flush -p .+' to flush all of the
table data to HDFS?

 

Another concern is that the Tablet Server process was no longer running
on the server.  I logged into that server and ran "start-here.sh".  The
tablet server is now running, but it is still reported as 'dead' to the
monitor. 

 

Thanks in advance,

Charles


Mime
View raw message