You might consider using a combination of AccumuloInputFormat and AccumuloFileOutputFormat in a map/reduce job. The job will run in parallel, speeding up your transformation, the map/reduce framework should help with hiccups, and the bulk load at the end provides a atomic, eventually consistent commit. These input/output formats can also be used with other job frameworks like Spark. See for example:



On Wed, Jun 21, 2017 at 1:49 AM, Sven Hodapp <> wrote:
Hi there,

I would like to select a subset of a Accumulo talbe and refactor the keys to create a new table.
There are about 30M records with a value size about 5-20KB each.
I'm using Accumulo 1.8.0 and Java accumulo-core client library 1.8.0.

I've written client code like that:

 * create a scanner fetching a specific column in a specific range
 * transforming the key into the new schema
 * using a batch writer to write the new generated mutations into the new table

    scan = createScanner(FROM, auths)
    // range, fetchColumn
    writer = createBatchWriter(TO, configWriter)
    iter = scan.iterator()
    while (iter.hasNext()) {
        entry =
        // create mutation with new key schema, but unaltered value

But this is slow and error prone (hiccups, ...).
Is it possible to use the Accumulo shell for such a task?
Are there another solutions I can use or some tricks?

Thank you very much for any advices!


Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany