kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mac Noland <mcdonaldnol...@gmail.com>
Subject Re: co-locating kudu table servers with HDFS data nodes
Date Wed, 22 Nov 2017 19:34:59 GMT
'm still in Kudu kindergarten, but here is the most common configuration we
run at our client base.  Happy to take feedback.

- tablet servers across our worker nodes.

- we use the same 'data' disk for HDFS and Kudu

- WAL files are separate and preferred on SSD.

- I'm still on my Kudu learning curve, but I believe the distribution is
controlled on how many partitions you specify in the table creation.  Here
is a read that probably helps.  We probably should spend more time up front
analyzing our requirements, but we generally match up partitions with the
number of tablet servers for all tables.  Happy to take feedback on that.

https://kudu.apache.org/docs/schema_design.html#partitioning

On Tue, Nov 21, 2017 at 6:17 PM, Sunil Parmar <sunilosunil@gmail.com> wrote:

> We are using CDH 5.12 and using HDFS for our primary data storage and
> Impala for querying them. Our worker node hosts both HDFS datanode and
> Impalad services. We're starting to move some of our data into KUDU and
> would like to understand community experiment and recommendation on
> disk/machine allocation and pro/cons for each.
>
> Install KUDU tablet server on each worker node vs separate machine
> Separate physical disks for KUDU tablet server on same machine vs sharing
> the disk with data nodes
> SSD vs spinning disks
>
> Some more questions on separate note but kinda related to the POC
> We have a small table as a first candidate for KUDU ( couple of G before
> replication ) . Does KUDU tries to distribute data across tablet servers
> for each table i.e. slow performance with too much sparse data. i.e. for
> small table what is better fewer disk partitions ( host-partition ) vs
> evenly distributed across worker nodes.
>
> Thanks,
> Sunil Parmar
>

Mime
View raw message