accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-3096) Scans stuck and seeing error message about contratint violation
Date Fri, 29 Aug 2014 20:17:53 GMT
Keith Turner created ACCUMULO-3096:
--------------------------------------

             Summary: Scans stuck and seeing error message about contratint violation
                 Key: ACCUMULO-3096
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3096
             Project: Accumulo
          Issue Type: Bug
            Reporter: Keith Turner


Just helped someone debug an issue. Their scans were getting stuck on a certain tserver (determined
tserver by turning on debug in shell).  On the tserver, there was a contant stream of messages
about a metadata table contstraint violate because {{Bulk load transaction no longer running}}.

The following code in {{Tablet.importMapFiles()}} 

{code:java}
          synchronized (timeLock) {
            if (bulkTime > persistedTime)
              persistedTime = bulkTime;

            MetadataTableUtil.updateTabletDataFile(tid, extent, paths, tabletTime.getMetadataValue(persistedTime),
creds, tabletServer.getLock());
          }
{code}

Ended up calling the following code in {{MetadataTableUtil}}.  

{code:java}
public static void update(Credentials credentials, ZooLock zooLock, Mutation m, KeyExtent
extent) {
    Writer t = extent.isMeta() ? getRootTable(credentials) : getMetadataTable(credentials);
    if (zooLock != null)
      putLockID(zooLock, m);
    while (true) {
      try {
        t.update(m);
        return;
      } catch (AccumuloException e) {
        log.error(e, e);
      } catch (AccumuloSecurityException e) {
        log.error(e, e);
      } catch (ConstraintViolationException e) {
        log.error(e, e);
      } catch (TableNotFoundException e) {
        log.error(e, e);
      }
      UtilWaitThread.sleep(1000);
    }

  }
{code}

So when the constraint failed, it retried forever.   It did this while holding timeLock, which
in turn prevented compactions from completing, which eventually gummed up scans.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message