accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: better presplitting
Date Thu, 26 Jun 2014 01:29:50 GMT
On Wed, Jun 25, 2014 at 4:50 PM, Keith Turner <keith@deenlo.com> wrote:

> I wrote a little utility to time splitting and subsequent balancing.  I
> will post some numbers from running this on EC2
>
> https://gist.github.com/keith-turner/5c561e438cb04c501b6e
>

posted some performance numbers on

https://issues.apache.org/jira/browse/ACCUMULO-2368


>
>
> On Fri, Jun 20, 2014 at 2:58 PM, Sean Busbey <busbey@cloudera.com> wrote:
>
>> One thing that jumped out from the most recent D4M paper was this quote:
>>
>>   One issue that was encountered is that after creating the pre-splits,
>> they all started out on one server. Accumulo load balanced the splits
>> across its servers at rate of ~50 splits/second, which is more than
>> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
>> splits.[1]
>>
>> Do we already have an open ticket that would help this? I think maybe
>> there's one about being able to presplit a table that is offline?
>>
>> I believe our recommended sweet spot is like 100-200 tablets per server
>> (though I can't find the reference for *why* I believe this ATM), which
>> means for clusters in the ~100s of nodes this would be in the ballpark for
>> an expected number of pre-splits.
>>
>>
>> [1]:  arXiv:1406.4923v1 [cs.DB]
>>
>> --
>> Sean
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message