accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: better presplitting
Date Fri, 20 Jun 2014 19:09:09 GMT
bq. They all started out on one server

This seems.. weird. Would be good to start addressing this by 
identifying what the actual balancer code does so we can immediately 
start to test the assertions. We can then use the results to identify 
the deficiencies that exist.

I think the 200splits per server was an Eric quote from some time ago 
(1.4-ish, maybe 1.5). I think this is relative to a bunch of things, 
workload and memory available most notably, and would be good to 
quantify too.

On 6/20/14, 11:58 AM, Sean Busbey wrote:
> One thing that jumped out from the most recent D4M paper was this quote:
>    One issue that was encountered is that after creating the pre-splits,
> they all started out on one server. Accumulo load balanced the splits
> across its servers at rate of ~50 splits/second, which is more than
> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
> splits.[1]
> Do we already have an open ticket that would help this? I think maybe
> there's one about being able to presplit a table that is offline?
> I believe our recommended sweet spot is like 100-200 tablets per server
> (though I can't find the reference for *why* I believe this ATM), which
> means for clusters in the ~100s of nodes this would be in the ballpark for
> an expected number of pre-splits.
> [1]:  arXiv:1406.4923v1 [cs.DB]

View raw message