accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject better presplitting
Date Fri, 20 Jun 2014 18:58:28 GMT
One thing that jumped out from the most recent D4M paper was this quote:

  One issue that was encountered is that after creating the pre-splits,
they all started out on one server. Accumulo load balanced the splits
across its servers at rate of ~50 splits/second, which is more than
adequate for normal operation, but can take ~20 minutes for 50,000 pre-
splits.[1]

Do we already have an open ticket that would help this? I think maybe
there's one about being able to presplit a table that is offline?

I believe our recommended sweet spot is like 100-200 tablets per server
(though I can't find the reference for *why* I believe this ATM), which
means for clusters in the ~100s of nodes this would be in the ballpark for
an expected number of pre-splits.


[1]:  arXiv:1406.4923v1 [cs.DB]

-- 
Sean

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message