kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Sun <sunc...@uber.com>
Subject Re: Low ingestion rate from Kafka
Date Tue, 31 Oct 2017 07:07:21 GMT
OK. Thanks! I changed to manual flush mode and it increased to ~15K / sec.
:)

Is there any other tuning I can do to further improve this? and also, how
much would
SSD help in this case (only upsert)?

Thanks again,
Chao

On Mon, Oct 30, 2017 at 11:42 PM, Todd Lipcon <todd@cloudera.com> wrote:

> If you want to manage batching yourself you can use the manual flush mode.
> Easiest would be the auto flush background mode.
>
> Todd
>
> On Oct 30, 2017 11:10 PM, "Chao Sun" <sunchao@uber.com> wrote:
>
>> Hi Todd,
>>
>> Thanks for the reply! I used a single Kafka consumer to pull the data.
>> For Kudu, I was doing something very simple that basically just follow
>> the example here
>> <https://github.com/cloudera/kudu-examples/blob/master/java/java-sample/src/main/java/org/kududb/examples/sample/Sample.java>
>> .
>> In specific:
>>
>> loop {
>>   Insert insert = kuduTable.newInsert();
>>   PartialRow row = insert.getRow();
>>   // fill the columns
>>   kuduSession.apply(insert)
>> }
>>
>> I didn't specify the flushing mode, so it will pick up the
>> AUTO_FLUSH_SYNC as default?
>> should I use MANUAL_FLUSH?
>>
>> Thanks,
>> Chao
>>
>> On Mon, Oct 30, 2017 at 10:39 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> Hey Chao,
>>>
>>> Nice to hear you are checking out Kudu.
>>>
>>> What are you using to consume from Kafka and write to Kudu? Is it
>>> possible that it is Java code and you are using the SYNC flush mode? That
>>> would result in a separate round trip for each record and thus very low
>>> throughput.
>>>
>>> Todd
>>>
>>> On Oct 30, 2017 10:23 PM, "Chao Sun" <sunchao@uber.com> wrote:
>>>
>>> Hi,
>>>
>>> We are evaluating Kudu (version kudu 1.3.0-cdh5.11.1, revision
>>> af02f3ea6d9a1807dcac0ec75bfbca79a01a5cab) on a 8-node cluster.
>>> The data are coming from Kafka at a rate of around 30K / sec, and hash
>>> partitioned into 128 buckets. However, with default settings, Kudu can only
>>> consume the topics at a rate of around 1.5K / second. This is a direct
>>> ingest with no transformation on the data.
>>>
>>> Could this because I was using the default configurations? also we are
>>> using Kudu on HDD - could that also be related?
>>>
>>> Any help would be appreciated. Thanks.
>>>
>>> Best,
>>> Chao
>>>
>>>
>>>
>>

Mime
View raw message