kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: "broadcast" tablet replication for kudu?
Date Fri, 16 Mar 2018 19:05:49 GMT
Also should mention that we currently limit the number of replicas of a
table to 7 due to the '--max-num-replicas' flag. In order to change this
you'd have to enable --unlock-unsafe-flags, which means you're going into
untested territory. Your mileage may vary but I wouldn't try it on a
production system.


On Fri, Mar 16, 2018 at 12:00 PM, Dan Burkert <danburkert@apache.org> wrote:

> On Fri, Mar 16, 2018 at 11:35 AM, Clifford Resnick <cresnick@mediamath.com
> > wrote:
>> Thanks for that, glad I was wrong there! Aside from replication
>> considerations, is it also recommended the number of tablet servers be odd?
> No, so long as you have enough tablet servers to host your desired
> replication factor you should be fine.  In production scenarios we
> typically recommend at least 4, since if you are 3x replicated and suffer a
> permanent node failure, the 4th node comes in handy as a fail-over target
> (Kudu will do this automatically).  But above and beyond that you don't
> need to worry about odd/even WRT number of tablet servers.
> - Dan
>> From: Dan Burkert <danburkert@apache.org>
>> Reply-To: "user@kudu.apache.org" <user@kudu.apache.org>
>> Date: Friday, March 16, 2018 at 2:09 PM
>> To: "user@kudu.apache.org" <user@kudu.apache.org>
>> Subject: Re: "broadcast" tablet replication for kudu?
>> The replication count is the number of tablet servers which Kudu will
>> host copies on.  So if you set the replication level to 5, Kudu will put
>> the data on 5 separate tablet servers.  There's no built-in broadcast table
>> feature; upping the replication factor is the closest thing.  A couple of
>> things to keep in mind:
>> - Always use an odd replication count.  This is important due to how the
>> Raft algorithm works.  Recent versions of Kudu won't even let you specify
>> an even number without flipping some flags.
>> - We don't test much much beyond 5 replicas.  It *should* work, but you
>> may run in to issues since it's a relatively rare configuration.  With a
>> heavy write workload and many replicas you are even more likely to
>> encounter issues.
>> It's also worth checking in an Impala forum whether it has features that
>> make joins against small broadcast tables better?  Perhaps Impala can cache
>> small tables locally when doing joins.
>> - Dan
>> On Fri, Mar 16, 2018 at 10:55 AM, Clifford Resnick <
>> cresnick@mediamath.com> wrote:
>>> The problem is, AFIK, that replication count is not necessarily the
>>> distribution count, so you can't guarantee all tablet servers will have a
>>> copy.
>>> On Mar 16, 2018 1:41 PM, Boris Tyukin <boris@boristyukin.com> wrote:
>>> I'm new to Kudu but we are also going to use Impala mostly with Kudu. We
>>> have a few tables that are small but used a lot. My plan is replicate them
>>> more than 3 times. When you create a kudu table, you can specify number of
>>> replicated copies (3 by default) and I guess you can put there a number,
>>> corresponding to your node count in cluster. The downside, you cannot
>>> change that number unless you recreate a table.
>>> On Fri, Mar 16, 2018 at 10:42 AM, Cliff Resnick <cresny@gmail.com>
>>> wrote:
>>>> We will soon be moving our analytics from AWS Redshift to Impala/Kudu.
>>>> One Redshift feature that we will miss is its ALL Distribution, where a
>>>> copy of a table is maintained on each server. We define a number of
>>>> metadata tables this way since they are used in nearly every query. We are
>>>> considering using parquet in HDFS cache for these, and Kudu would be a much
>>>> better fit for the update semantics but we are worried about the additional
>>>> contention.  I'm wondering if having a Broadcast, or ALL, tablet
>>>> replication might be an easy feature to add to Kudu?
>>>> -Cliff

Todd Lipcon
Software Engineer, Cloudera

View raw message