accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3193) bulkImport file rename is a bottleneck
Date Thu, 02 Oct 2014 20:18:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157109#comment-14157109
] 

Eric Newton commented on ACCUMULO-3193:
---------------------------------------

We've been assuming that files are uniquely named, so we have to rename them anyhow.


> bulkImport file rename is a bottleneck
> --------------------------------------
>
>                 Key: ACCUMULO-3193
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3193
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1
>         Environment: very large cluster
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.5.3, 1.6.2, 1.7.0
>
>
> On a very large cluster, importing a few thousand files takes several minutes.  Most
of that time is spent renaming the user's files into the accumulo bulk-load directory.  In
this case, the master is competing against the other demands on the NN.  The master could
adopt the same strategy as the file GC, and run the renames in parallel, to push more operations
into the NN at one time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message