accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Moss (BLOOMBERG/ 731 LEX)" <mmos...@bloomberg.net>
Subject 1 of 20 TServers unresponsive/slow, all writes fail?
Date Fri, 09 Sep 2016 13:40:42 GMT
Hi,

We are starting to investigate an issue where 1 tserver was up, but became slow/unresponsive
for several hours, yet all writes to our 20+ servers began to fail. We could see leading up
to the failure that the writes were distributed among all of the tablet servers, so it wasn't
a hotspot. Whenever we receive a MutationsRejectedException, we recreate the BatchWriter (ACCUMULO-2990).
I'm digging into the TabletServerBatchWriter code, but any ideas what could cause this issue?
Is there some sort of initialization or healthchecking that the client does where 1 server
could impact all?

Thanks.

-Mike

Caused by: org.apache.accumulo.core.client.TimedOutException: Servers timed out [pnj-bvlt-r4n03.abc.com:31113]
at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$TimeoutTracker.wroteNothing(TabletServerBatchWriter.java:177)
~[stormjar.jar:1.0] at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$TimeoutTracker.errorOccured(TabletServerBatchWriter.java:182)
~[stormjar.jar:1.0] at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:933)
~[stormjar.jar:1.0] at 
Mime
View raw message