accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1833) MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple threads
Date Fri, 08 Nov 2013 03:25:22 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816957#comment-13816957
] 

ASF subversion and git services commented on ACCUMULO-1833:
-----------------------------------------------------------

Commit 0be6f0a79ef404839ab64d60f16991c6031a93a3 in branch refs/heads/ACCUMULO-1833-caching
from [~elserj]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0be6f0a ]

ACCUMULO-1833 Ensure that we close the MTBW at the end of the test to
avoid it getting GC'ed later and trying to flush when ZK and the
instance is already gone.


> MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple threads
> ---------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1833
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1833
>             Project: Accumulo
>          Issue Type: Improvement
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Chris McCubbin
>         Attachments: ACCUMULO-1833-test.patch, ZooKeeperThreadUtilization.png
>
>
> This issue comes from profiling our application. We have a MultiTableBatchWriter created
by normal means. I am attempting to write to it with multiple threads by doing things like
the following:
> {code}
> batchWriter.getBatchWriter(table).addMutations(mutations);
> {code}
> In my test with 4 threads writing to one table, this call is quite inefficient and results
in a large performance degradation over a single BatchWriter.
> I believe the culprit is the fact that the call is synchronized. Also there is the possibility
that the zookeeper call to Tables.getTableState on every call is negatively affecting performance:
> {code}
>   @Override
>   public synchronized BatchWriter getBatchWriter(String tableName) throws AccumuloException,
AccumuloSecurityException, TableNotFoundException {
>     ArgumentChecker.notNull(tableName);
>     String tableId = Tables.getNameToIdMap(instance).get(tableName);
>     if (tableId == null)
>       throw new TableNotFoundException(tableId, tableName, null);
>     
>     if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
>       throw new TableOfflineException(instance, tableId);
>     
>     BatchWriter tbw = tableWriters.get(tableId);
>     if (tbw == null) {
>       tbw = new TableBatchWriter(tableId);
>       tableWriters.put(tableId, tbw);
>     }
>     return tbw;
>   }
> {code}
> I recommend moving the synchronized block to happen only if the batchwriter is not present,
and also only checking if the table is online at that time:
> {code}
>   @Override
>   public BatchWriter getBatchWriter(String tableName) throws AccumuloException, AccumuloSecurityException,
TableNotFoundException {
>     ArgumentChecker.notNull(tableName);
>     String tableId = Tables.getNameToIdMap(instance).get(tableName);
>     if (tableId == null)
>       throw new TableNotFoundException(tableId, tableName, null);
>     BatchWriter tbw = tableWriters.get(tableId);
>     if (tbw == null) {
>       if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
>           throw new TableOfflineException(instance, tableId);
>       tbw = new TableBatchWriter(tableId);
>       synchronized(tableWriters){
>           //only create a new table writer if we haven't been beaten to it.
>           if (tableWriters.get(tableId) == null)      
>               tableWriters.put(tableId, tbw);
>       }
>     }
>     return tbw;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message