kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: Proposal: remove default partitioning for new tables
Date Thu, 02 Jun 2016 17:21:41 GMT
This change was intentional, since I wanted to thoroughly remove the
defaults in one go.  We can always add back implicit range columns when
range splits are specified later if we decide it's too much of a burden.  I
personally like the symmetry with hash partitioning, where setting the
columns explicitly is required.  It also makes it clear that you _do not_
get range partitioning if you do not set the range partitioning columns.  I
can see it  being somewhat confusing if we implicitly set range
partitioning columns when there are range splits, but not when there are
hash partitions.

 - Dan

On Thu, Jun 2, 2016 at 8:40 AM, Todd Lipcon <todd@cloudera.com> wrote:

> Hey Dan,
>
> One quick thing I just stumbled upon... it seems like the old behavior was
> that you could do the following:
>
> CreateTableOptions builder = new CreateTableOptions();
> builder.addSplitRow(...);
> builder.addSplitRow(...);
> ...
> client.createTable("foo", schema, builder);
>
> and it would assume that this was range partitioning based on the whole
> primary key. The user in this case _is_ specifying split rows, so I figured
> this counted as an explicit partitioning choice and thus wouldn't be
> affected by the change mentioned above. Instead, I'm getting an error that
> no range partition columns were specified.
>
> Was this on purpose? Of course I can call setRangePartitionColumns to work
> around it, but didn't know if it was intentional.
>
> -Todd
>
> On Thu, May 26, 2016 at 11:53 PM, Dan Burkert <dan@cloudera.com> wrote:
>
> > Hi all,
> >
> > Thanks for the feedback!  We've made this change, and it will be part of
> > the upcoming 0.9 release.  Going forward, all create table calls must
> have
> > partitioning specified.  Existing tables will not be affected.
> >
> > - Dan
> >
> > On Fri, May 20, 2016 at 6:41 AM, Jordan Birdsell <
> > jordan.birdsell.kdvm@statefarm.com> wrote:
> >
> > > +1 ...this is a great recommendation
> > >
> > > -----Original Message-----
> > > From: Sand Stone [mailto:sand.m.stone@gmail.com]
> > > Sent: Thursday, May 19, 2016 10:39 PM
> > > To: user@kudu.incubator.apache.org
> > > Cc: dev@kudu.incubator.apache.org
> > > Subject: Re: Proposal: remove default partitioning for new tables
> > >
> > > Agreed that this is a sensible API change.
> > >
> > > On Thu, May 19, 2016 at 4:07 PM, Abhi Basu <9000revs@gmail.com> wrote:
> > >
> > > > I think this a very reasonable feature request. I have recently
> started
> > > > working with Kudu and the "default" behavior has already tripped me
> up
> > a
> > > > couple times.
> > > >
> > > > Thanks,
> > > >
> > > > Abhi
> > > >
> > > > On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <danburkert@apache.org>
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> One of the issues that trips up new Kudu users is the uncertainty
> > about
> > > >> how partitioning works, and how to use partitioning effectively.
> Much
> > > of
> > > >> this can be addressed with better documentation and explanatory
> > > materials,
> > > >> and that should be an area of focus leading up to our 1.0 release.
> > > However,
> > > >> the default partitioning behavior is suboptimal, and changing the
> > > default
> > > >> could lead to significantly less user confusion and frustration.
> > > Currently,
> > > >> when creating a new table, Kudu defaults to using only a single
> > tablet,
> > > >> which is a known anti-pattern.  This can be painful for users who
> > > create a
> > > >> table assuming Kudu will have good defaults, and begin loading data
> > > only to
> > > >> find out later that they will need to recreate the table with
> > > partitioning
> > > >> to achieve good results.
> > > >>
> > > >> A better default partitioning strategy might be hash partitioning
> over
> > > >> the primary key columns, with a number of hash buckets based on the
> > > number
> > > >> of tablet servers (perhaps something like 3x the number of tablet
> > > >> servers).  This would alleviate the worst scalability issues with
> the
> > > >> current default, however it has a few downsides of its own. Hash
> > > >> partitioning is not appropriate for every use case, and any
> > > rule-of-thumb
> > > >> number of tablets we could come up with will not always be optimal.
> > > >>
> > > >> Given that there is no bullet-proof default, and that changing
> > > >> partitioning strategy after table creation is impossible, and
> changing
> > > the
> > > >> default partitioning strategy is a backwards incompatible change,
I
> > > propose
> > > >> we remove the default altogether.  Users would be required to
> > explicitly
> > > >> specify the table partitioning during creation, and failing to do
so
> > > would
> > > >> result in an illegal argument error.  Users who really do want only
> a
> > > >> single tablet will still be able to do so by explicitly configuring
> > > range
> > > >> partitioning with no split rows.
> > > >>
> > > >> I'd like to get community feedback on whether this seems like a good
> > > >> direction to take.  I have put together a patch, you can check out
> the
> > > >> changes to test files to see what it looks like to add partitioning
> > > >> explicitly in cases where the default was being relied on.
> > > >> http://gerrit.cloudera.org:8080/#/c/3131/
> > > >>
> > > >> - Dan
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Abhi Basu
> > > >
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message