accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: Using AccumuloOutputFormat, All Records Stored In One Tablet (Node)
Date Mon, 16 Apr 2012 20:45:57 GMT
On Monday, April 16, 2012 3:01:03 PM, "David Medinets" <david.medinets@gmail.com> wrote:
> I'll ask another basic question. The row id values are stored as
> strings. So "1" and "1111" are sorted together. Let's say that I have
> five nodes. Would I run this?
> 
> addsplits 2 4 6 8 -t table

That syntax looks correct.  Those particular split points might or might not be what you want
depending on the distribution of your data.

Billie


> On Mon, Apr 16, 2012 at 2:55 PM, David Medinets
> <david.medinets@gmail.com> wrote:
> > argh ... Just to be clear. The splits are essentially partitions of
> > the row id?
> >
> > Can I add splits after the data is ingested? If so, how can I
> > redistribute?
> >
> > On Mon, Apr 16, 2012 at 2:45 PM, Eric Newton <eric.newton@gmail.com>
> > wrote:
> >> Create the table with splits, but this requires you to know
> >> something about
> >> the distribution of your data.
> >>
> >> -Eric
> >>
> >>
> >> On Mon, Apr 16, 2012 at 2:38 PM, David Medinets
> >> <david.medinets@gmail.com>
> >> wrote:
> >>>
> >>> Hopefully I am doing something wrong that can be easily rectified.
> >>> I
> >>> have an hadoop job that is sending well over 200M entries into
> >>> accumulo. But every entry is being sent to a single node. The
> >>> table
> >>> was created by the hadoop job.
> >>>
> >>> How can I get the entries to be spread over several nodes?
> >>
> >>

Mime
View raw message