accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: table splits
Date Mon, 21 May 2012 18:10:12 GMT
Yes, you can and should do that.  Just keep an eye out for how much work
each map/reduce needs to do: you want to give it a bunch of work to do so
you aren't spending 30 seconds to start a process that finishes in 10
seconds.

-Eric

On Mon, May 21, 2012 at 12:48 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov>wrote:

> Eric,
>
> Thanks for the quick reply.  Another question - My cluster has over 80
> cpus available.  Suppose I create something like 50 splits across the 7
> servers - I will increase my map job count accordingly.  What are your
> thoughts on this?
>
> Thanks,
> Ralph
>
> __________________________________________________
> Ralph Perko
> Pacific Northwest National Laboratory
>
>
>
> From: Eric Newton <eric.newton@gmail.com<mailto:eric.newton@gmail.com>>
> Reply-To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <
> user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
> To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <
> user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
> Subject: Re: table splits
>
> You need to estimate the size of the split.  First, get the id of the
> table with "tables -l" in the accumulo shell.
>
> Then, find out the size of table in hdfs:
>
>  $ hadoop fs -dus /accumulo/tables/<id>
>
> Divide by 7, and use that as the split size:
>
>  shell> config -t mytable -s table.split.threshold=newsize
>
> The table will automatically split out.  Afterwards, you can then raise
> the split size to keep it from splitting until it gets much bigger:
>
> shell> config -t mytable -s table.split.threshold=1G
>
> -Eric
>
> On Mon, May 21, 2012 at 12:24 PM, Perko, Ralph J <Ralph.Perko@pnnl.gov
> <mailto:Ralph.Perko@pnnl.gov>> wrote:
> Hi,
>
> I am looking for advice on how to best layout my table splits.  I have a 7
> node cluster and my table contains ~10M records.  I would like to split the
> table equally across all the servers however I see no utility to do this in
> this manner.  I understand I can create splits for some letter range but I
> was hoping for some way to have accumulo create "n" equal splits.  Is this
> possible?  Right now the best way I see to handle this is to write a
> utility that iterates the table, keeps a count and at some given value
> (table size/ split count) spits out the beginning and end row and then I
> create the split manually.
>
> Thanks,
> Ralph
>
> __________________________________________________
> Ralph Perko
> Pacific Northwest National Laboratory
>
>
>
>

Mime
View raw message