accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: table splits
Date Mon, 21 May 2012 16:56:29 GMT
On Monday, May 21, 2012 12:33:05 PM, "Eric Newton" <eric.newton@gmail.com> wrote:
> You need to estimate the size of the split. First, get the id of the
> table with "tables -l" in the accumulo shell.
> 
> 
> Then, find out the size of table in hdfs:
> 
> 
> $ hadoop fs -dus /accumulo/tables/<id>
> 
> 
> Divide by 7, and use that as the split size:
> 
> 
> shell> config -t mytable -s table.split.threshold=newsize
> 
> 
> The table will automatically split out. Afterwards, you can then raise
> the split size to keep it from splitting until it gets much bigger:
> 
> 
> shell> config -t mytable -s table.split.threshold=1G


It's going to be hard to get exactly 7 splits using that method.  When Accumulo sees a tablet's
size is over the threshold, it attempts to split it in half.  If both of the resulting tablet
sizes are above the threshold, it splits those in half.  Assuming a uniform key distribution,
you're likely to end up with 2^N tablets.  8 tablets on 7 servers would have one always doing
twice the work, so you might be better off aiming for a larger number of tablets, which, I
see now, answers your next question.  If the key distribution isn't uniform, you may not see
this 2^N behavior, but I would still recommend having significantly more tablets than tservers
to make load balancing easier.

Billie


> -Eric
> 
> 
> 
> On Mon, May 21, 2012 at 12:24 PM, Perko, Ralph J <
> Ralph.Perko@pnnl.gov > wrote:
> 
> 
> Hi,
> 
> I am looking for advice on how to best layout my table splits. I have
> a 7 node cluster and my table contains ~10M records. I would like to
> split the table equally across all the servers however I see no
> utility to do this in this manner. I understand I can create splits
> for some letter range but I was hoping for some way to have accumulo
> create "n" equal splits. Is this possible? Right now the best way I
> see to handle this is to write a utility that iterates the table,
> keeps a count and at some given value (table size/ split count) spits
> out the beginning and end row and then I create the split manually.
> 
> Thanks,
> Ralph
> 
> __________________________________________________
> Ralph Perko
> Pacific Northwest National Laboratory

Mime
View raw message