kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张晓宁 <zhangxiaon...@jd.com>
Subject 答复: A few questions for using Kudu
Date Fri, 16 Mar 2018 03:32:41 GMT
Thank you Dan! My follow-up comments with XiaoNing.

发件人: Dan Burkert [mailto:danburkert@apache.org]
发送时间: 2018年3月16日 1:06
收件人: user@kudu.apache.org
主题: Re: A few questions for using Kudu

Hi, answers inline:
On Thu, Mar 15, 2018 at 3:12 AM, 张晓宁 <zhangxiaoning@jd.com<mailto:zhangxiaoning@jd.com>>
I have a few questions for using kudu:

1.       As more and more data inserted to kudu, the performance decrease. After continuous
data insertion for about 30 minutes, the TPS performance decreased with 20%, and after 1-hour
data insertion, the performance decreased with 40%. Is this a known issue?
This is expected if you are inserting data in random order.  If you try another benchmark
where you insert data in primary key sorted order, you'll see that the performance will be
much higher, and more consistent.  If you have a heavy insert workload, this kind of optimization
is critical.  The table's partitioning and primary key can often be designed to make this
happen naturally, but it's a dataset dependent thing, so without more specifics about your
data it's difficult to give more precise advice.
 XiaoNing: Our table has 2 partitions,the first level partition is by date range(using the
column timestamp),one partition for one single day, and the second partition is by a hash
on 2 column(key + host).These 3 columns(timestamp,key,host) are the primary key of the table.For
you comment “insert data in primary key sorted order”,do you mean we need to sort the
data on the 3 primary-key columns before insertion?

2.       When setting the replica number to be 1, totally I will have 2 copy of data(1 master
data + 1 replica data), is this true?
That's incorrect.  The master node does not hold any table data.  If you set the number of
replicas to be 1, you will lose data if you lose the tablet server which holds the replica.
 We always recommend production workloads set number of replicas to 3 in order to have fault
 XiaoNing: So if we want to have fault tolerance, we should at least set the replica number
to be 3, right?

3.       I want to install kudu 1.6, but our machine cannot connect to public internet. Will
kudu team build out the rpm packages for 1.6 version?

The Apache Kudu project does not provide binary artifacts for releases, however vendors can
and do.  For instance you can find Cloudera's RPMs corresponding to Kudu 1.6 here<https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5.14/RPMS/x86_64/>.
 XiaoNing: Got it, thanks.
- Dan
View raw message