Sven,

You might consider using a combination of AccumuloInputFormat and AccumuloFileOutputFormat in a map/reduce job. The job will run in parallel, speeding up your transformation, the map/reduce framework should help with hiccups, and the bulk load at the end provides a atomic, eventually consistent commit. These input/output formats can also be used with other job frameworks like Spark. See for example:

examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/TableToFile.java
examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/bulk/BulkIngestExample.java

Cheers,
Adam



On Wed, Jun 21, 2017 at 1:49 AM, Sven Hodapp <sven.hodapp@scai.fraunhofer.de> wrote:
Hi there,

I would like to select a subset of a Accumulo talbe and refactor the keys to create a new table.
There are about 30M records with a value size about 5-20KB each.
I'm using Accumulo 1.8.0 and Java accumulo-core client library 1.8.0.

I've written client code like that:

 * create a scanner fetching a specific column in a specific range
 * transforming the key into the new schema
 * using a batch writer to write the new generated mutations into the new table

    scan = createScanner(FROM, auths)
    // range, fetchColumn
    writer = createBatchWriter(TO, configWriter)
    iter = scan.iterator()
    while (iter.hasNext()) {
        entry = iter.next()
        // create mutation with new key schema, but unaltered value
        writer.addMutation(mutation)
    }
    writer.close()

But this is slow and error prone (hiccups, ...).
Is it possible to use the Accumulo shell for such a task?
Are there another solutions I can use or some tricks?

Thank you very much for any advices!

Regards,
Sven

--
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de