accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Presplitting tables for the YCSB workloads
Date Fri, 18 Sep 2015 02:10:29 GMT
YCSB is gearing up for its next monthly release, and I really want to
add in an Accumulo specific README for running workloads.

This is generally so that folks have an easier time running tests
themselves. It's also because I keep testing Accumulo for the YCSB
releases and coupled with a README file we'd get an Accumulo-specific
convenience binary. Avoiding the bulk of dependencies that get
included in the generic YCSB distribution artifact is a big win.

The thing I keep getting hung up on is remembering how to properly
split the Accumulo table for YCSB workloads. The HBase README has a
great hbase shell snippet for doing this (because users can copy/paste
it)[1]:

----
3. Create a HBase table for testing

For best results, use the pre-splitting strategy recommended in HBASE-4163:

hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of
regionservers)
hbase(main):002:0> create 'usertable', 'family', {SPLITS =>
(1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}

Failing to do so will cause all writes to initially target a single
region server.
----

Anyone have a work up of an equivalent for Accumulo that I can include
under an ASLv2 license? I seem to recall madrob had something done in
a bash script, but I can't find it anywhere.

[1]: http://s.apache.org/CFe

-- 
Sean

Mime
View raw message