accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Key Refactroing
Date Wed, 21 Jun 2017 16:43:17 GMT
Sven,

You might consider using a combination of AccumuloInputFormat and
AccumuloFileOutputFormat in a map/reduce job. The job will run in parallel,
speeding up your transformation, the map/reduce framework should help with
hiccups, and the bulk load at the end provides a atomic, eventually
consistent commit. These input/output formats can also be used with other
job frameworks like Spark. See for example:

examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/TableToFile.java
examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/bulk/BulkIngestExample.java

Cheers,
Adam



On Wed, Jun 21, 2017 at 1:49 AM, Sven Hodapp <sven.hodapp@scai.fraunhofer.de
> wrote:

> Hi there,
>
> I would like to select a subset of a Accumulo talbe and refactor the keys
> to create a new table.
> There are about 30M records with a value size about 5-20KB each.
> I'm using Accumulo 1.8.0 and Java accumulo-core client library 1.8.0.
>
> I've written client code like that:
>
>  * create a scanner fetching a specific column in a specific range
>  * transforming the key into the new schema
>  * using a batch writer to write the new generated mutations into the new
> table
>
>     scan = createScanner(FROM, auths)
>     // range, fetchColumn
>     writer = createBatchWriter(TO, configWriter)
>     iter = scan.iterator()
>     while (iter.hasNext()) {
>         entry = iter.next()
>         // create mutation with new key schema, but unaltered value
>         writer.addMutation(mutation)
>     }
>     writer.close()
>
> But this is slow and error prone (hiccups, ...).
> Is it possible to use the Accumulo shell for such a task?
> Are there another solutions I can use or some tricks?
>
> Thank you very much for any advices!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de
>

Mime
View raw message