Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Date: Mon, 16 Apr 2012 19:09:01 +0000 (GMT+00:00)
From: Billie J Rinaldi <billie.j.rinaldi@ugov.gov>
To: user@accumulo.apache.org
Message-ID: 
 <1560648745.394487.1334603341934.JavaMail.root@linzimmb04o.imo.intelink.gov>
In-Reply-To: 
 <CAOiJXP6d3r=uEeU1icvPJntdHiy2teT0Y+S37kr0-K4cDRic8g@mail.gmail.com>
Subject: Re: Using AccumuloOutputFormat, All Records Stored In One Tablet
 (Node)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

On Monday, April 16, 2012 2:55:48 PM, "David Medinets" <david.medinets@gmail.com> wrote:
> argh ... Just to be clear. The splits are essentially partitions of
> the row id?

Yes, specified by the end of the range.

> Can I add splits after the data is ingested? If so, how can I
> redistribute?

Yes.  You can either add specific split points, or you can lower the split threshold based on the size of the table.  For example, if the table size is S bytes, and you ideally want to have T tablets, then set the table's split threshold to S/T.  These calculations are rarely exact, so I would start high on the split threshold, let it split out, see if the number of tablets is ok, then lower again if necessary.

Billie


> On Mon, Apr 16, 2012 at 2:45 PM, Eric Newton <eric.newton@gmail.com>
> wrote:
> > Create the table with splits, but this requires you to know
> > something about
> > the distribution of your data.
> >
> > -Eric
> >
> >
> > On Mon, Apr 16, 2012 at 2:38 PM, David Medinets
> > <david.medinets@gmail.com>
> > wrote:
> >>
> >> Hopefully I am doing something wrong that can be easily rectified.
> >> I
> >> have an hadoop job that is sending well over 200M entries into
> >> accumulo. But every entry is being sent to a single node. The table
> >> was created by the hadoop job.
> >>
> >> How can I get the entries to be spread over several nodes?
> >
> >