accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris McCubbin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-1833) MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple threads
Date Thu, 31 Oct 2013 15:13:19 GMT
Chris McCubbin created ACCUMULO-1833:
----------------------------------------

             Summary: MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple
threads
                 Key: ACCUMULO-1833
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1833
             Project: Accumulo
          Issue Type: Improvement
    Affects Versions: 1.5.0, 1.6.0
            Reporter: Chris McCubbin


This issue comes from profiling our application. We have a MultiTableBatchWriter created by
normal means. I am attempting to write to it with multiple threads by doing things like the
following:

{code}
batchWriter.getBatchWriter(table).addMutations(mutations);
{code}

In my test with 4 threads writing to one table, this call is quite inefficient and results
in a large performance degradation over a single BatchWriter.

I believe the culprit is the fact that the call is synchronized. Also there is the possibility
that the zookeeper call to Tables.getTableState on every call is negatively affecting performance:

{code}
  @Override
  public synchronized BatchWriter getBatchWriter(String tableName) throws AccumuloException,
AccumuloSecurityException, TableNotFoundException {
    ArgumentChecker.notNull(tableName);
    String tableId = Tables.getNameToIdMap(instance).get(tableName);
    if (tableId == null)
      throw new TableNotFoundException(tableId, tableName, null);
    
    if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
      throw new TableOfflineException(instance, tableId);
    
    BatchWriter tbw = tableWriters.get(tableId);
    if (tbw == null) {
      tbw = new TableBatchWriter(tableId);
      tableWriters.put(tableId, tbw);
    }
    return tbw;
  }
{code}

I recommend moving the synchronized block to happen only if the batchwriter is not present,
and also only checking if the table is online at that time:

{code}
  @Override
  public BatchWriter getBatchWriter(String tableName) throws AccumuloException, AccumuloSecurityException,
TableNotFoundException {
    ArgumentChecker.notNull(tableName);
    String tableId = Tables.getNameToIdMap(instance).get(tableName);
    if (tableId == null)
      throw new TableNotFoundException(tableId, tableName, null);

    BatchWriter tbw = tableWriters.get(tableId);
    if (tbw == null) {

      if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
          throw new TableOfflineException(instance, tableId);
      tbw = new TableBatchWriter(tableId);
      synchronized(tableWriters){
          //only create a new table writer if we haven't been beaten to it.
          if (tableWriters.get(tableId) == null)      
              tableWriters.put(tableId, tbw);
      }
    }
    return tbw;
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message