accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Drob (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-2716) Duplicate connection loss logging in Writer
Date Tue, 22 Apr 2014 17:20:15 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Drob updated ACCUMULO-2716:
--------------------------------

    Description: 
Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver
dies.

| WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused |
| ERROR | error sending update to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused |

These always occur in pairs, at the same millisecond, and coming from the same tserver. I
_think_ that they are updates to the metadata table coming from these tservers, like flushes
or compactions that fail because the dead server was hosting the corresponding metadata tablet,
but it doesn't really matter.

The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
{code}
    } catch (TTransportException e) {
      log.warn("Error connecting to " + server + ": " + e);
      throw e;
    }
{code}

and then later log again in {{update()}}:
{code}
      } catch (TException e) {
        log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
        TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
      }
{code}



  was:
Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver
dies.

| WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused |
| ERROR | error sending update to a2422.halxg.cloudera.com:10011: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused |

These always occur in pairs, at the same millisecond, and coming from the same tserver. I
_think_ that they are updates to the metadata table coming from these tservers, like flushes
or compactions that fail because the dead server was hosting the corresponding metadata tablet,
but it doesn't really matter.

The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
{code}
    } catch (TTransportException e) {
      log.warn("Error connecting to " + server + ": " + e);
      throw e;
    }
{code}

and then later log again in {{update()}}:
{code}
      } catch (TException e) {
        log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
        TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
      }
{code}




> Duplicate connection loss logging in Writer
> -------------------------------------------
>
>                 Key: ACCUMULO-2716
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2716
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>              Labels: logging
>
> Running CI with agitation, I see lots of duplicated messages in the monitor whenever
a tserver dies.
> | WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused |
> | ERROR | error sending update to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused |
> These always occur in pairs, at the same millisecond, and coming from the same tserver.
I _think_ that they are updates to the metadata table coming from these tservers, like flushes
or compactions that fail because the dead server was hosting the corresponding metadata tablet,
but it doesn't really matter.
> The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
> {code}
>     } catch (TTransportException e) {
>       log.warn("Error connecting to " + server + ": " + e);
>       throw e;
>     }
> {code}
> and then later log again in {{update()}}:
> {code}
>       } catch (TException e) {
>         log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
>         TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message