accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@cs.washington.edu>
Subject Re: Key Refactroing
Date Wed, 21 Jun 2017 15:17:50 GMT
Hi Sven,

There are other solutions that depend on what your Key schema
transformation is.

If the new schema is order-compatible with the old one, meaning that the
new Keys have the same sort order as the old keys, then you could (1) clone
the table and (2) attach a server-side SortedKeyValueIterator (SKVI) that
performs the transformation on all iterator scopes.  This will change the
schema "on the fly".  Even if your new schema is order-compatible with the
old schema *up to a prefix* (say, up to the Row), you could use this trick
inside your SKVI by (1) gathering all keys within that prefix (e.g.,
WholeRowIterator), (2) transforming each gathered Key, and (3) emitting the
new Keys in sorted order.

If your Key schema transformation non-monotonically changes the Key sort
order, there are fewer built-in Accumulo options.  You might look at the
iterator framework provided by the Graphulo library
<http://graphulo.mit.edu/>.  Graphulo is built to do complex server-side
data processing, reading in entries from some number of tables and writing
them out to a new table at the server (see RemoteWriteIterator
<https://github.com/Accla/graphulo/blob/master/src/main/java/edu/mit/ll/graphulo/skvi/RemoteWriteIterator.java>).
Disclaimer: I authored Graphulo.

If you decide to go with your original solution, you might consider running
multiple such Accumulo clients in parallel.

Cheers, Dylan

On Wed, Jun 21, 2017 at 1:49 AM, Sven Hodapp <sven.hodapp@scai.fraunhofer.de
> wrote:

> Hi there,
>
> I would like to select a subset of a Accumulo talbe and refactor the keys
> to create a new table.
> There are about 30M records with a value size about 5-20KB each.
> I'm using Accumulo 1.8.0 and Java accumulo-core client library 1.8.0.
>
> I've written client code like that:
>
>  * create a scanner fetching a specific column in a specific range
>  * transforming the key into the new schema
>  * using a batch writer to write the new generated mutations into the new
> table
>
>     scan = createScanner(FROM, auths)
>     // range, fetchColumn
>     writer = createBatchWriter(TO, configWriter)
>     iter = scan.iterator()
>     while (iter.hasNext()) {
>         entry = iter.next()
>         // create mutation with new key schema, but unaltered value
>         writer.addMutation(mutation)
>     }
>     writer.close()
>
> But this is slow and error prone (hiccups, ...).
> Is it possible to use the Accumulo shell for such a task?
> Are there another solutions I can use or some tricks?
>
> Thank you very much for any advices!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de
>

Mime
View raw message