accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Perko, Ralph J" <Ralph.Pe...@pnnl.gov>
Subject Re: table splits
Date Mon, 21 May 2012 16:48:03 GMT
Eric,

Thanks for the quick reply.  Another question - My cluster has over 80 cpus available.  Suppose
I create something like 50 splits across the 7 servers - I will increase my map job count
accordingly.  What are your thoughts on this?

Thanks,
Ralph

__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory



From: Eric Newton <eric.newton@gmail.com<mailto:eric.newton@gmail.com>>
Reply-To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: table splits

You need to estimate the size of the split.  First, get the id of the table with "tables -l"
in the accumulo shell.

Then, find out the size of table in hdfs:

 $ hadoop fs -dus /accumulo/tables/<id>

Divide by 7, and use that as the split size:

 shell> config -t mytable -s table.split.threshold=newsize

The table will automatically split out.  Afterwards, you can then raise the split size to
keep it from splitting until it gets much bigger:

shell> config -t mytable -s table.split.threshold=1G

-Eric

On Mon, May 21, 2012 at 12:24 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov<mailto:Ralph.Perko@pnnl.gov>>
wrote:
Hi,

I am looking for advice on how to best layout my table splits.  I have a 7 node cluster and
my table contains ~10M records.  I would like to split the table equally across all the servers
however I see no utility to do this in this manner.  I understand I can create splits for
some letter range but I was hoping for some way to have accumulo create "n" equal splits.
 Is this possible?  Right now the best way I see to handle this is to write a utility that
iterates the table, keeps a count and at some given value (table size/ split count) spits
out the beginning and end row and then I create the split manually.

Thanks,
Ralph

__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory




Mime
View raw message