kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhen Zhang <zhqu...@gmail.com>
Subject Re: Low ingestion rate from Kafka
Date Wed, 01 Nov 2017 01:20:55 GMT
Maybe you can add your consumer number? In my opinion, more threads to
insert can give a better throughput.

2017-10-31 15:07 GMT+08:00 Chao Sun <sunchao@uber.com>:

> OK. Thanks! I changed to manual flush mode and it increased to ~15K / sec.
> :)
>
> Is there any other tuning I can do to further improve this? and also, how
> much would
> SSD help in this case (only upsert)?
>
> Thanks again,
> Chao
>
> On Mon, Oct 30, 2017 at 11:42 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> If you want to manage batching yourself you can use the manual flush
>> mode. Easiest would be the auto flush background mode.
>>
>> Todd
>>
>> On Oct 30, 2017 11:10 PM, "Chao Sun" <sunchao@uber.com> wrote:
>>
>>> Hi Todd,
>>>
>>> Thanks for the reply! I used a single Kafka consumer to pull the data.
>>> For Kudu, I was doing something very simple that basically just follow
>>> the example here
>>> <https://github.com/cloudera/kudu-examples/blob/master/java/java-sample/src/main/java/org/kududb/examples/sample/Sample.java>
>>> .
>>> In specific:
>>>
>>> loop {
>>>   Insert insert = kuduTable.newInsert();
>>>   PartialRow row = insert.getRow();
>>>   // fill the columns
>>>   kuduSession.apply(insert)
>>> }
>>>
>>> I didn't specify the flushing mode, so it will pick up the
>>> AUTO_FLUSH_SYNC as default?
>>> should I use MANUAL_FLUSH?
>>>
>>> Thanks,
>>> Chao
>>>
>>> On Mon, Oct 30, 2017 at 10:39 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>>
>>>> Hey Chao,
>>>>
>>>> Nice to hear you are checking out Kudu.
>>>>
>>>> What are you using to consume from Kafka and write to Kudu? Is it
>>>> possible that it is Java code and you are using the SYNC flush mode? That
>>>> would result in a separate round trip for each record and thus very low
>>>> throughput.
>>>>
>>>> Todd
>>>>
>>>> On Oct 30, 2017 10:23 PM, "Chao Sun" <sunchao@uber.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We are evaluating Kudu (version kudu 1.3.0-cdh5.11.1, revision
>>>> af02f3ea6d9a1807dcac0ec75bfbca79a01a5cab) on a 8-node cluster.
>>>> The data are coming from Kafka at a rate of around 30K / sec, and hash
>>>> partitioned into 128 buckets. However, with default settings, Kudu can only
>>>> consume the topics at a rate of around 1.5K / second. This is a direct
>>>> ingest with no transformation on the data.
>>>>
>>>> Could this because I was using the default configurations? also we are
>>>> using Kudu on HDD - could that also be related?
>>>>
>>>> Any help would be appreciated. Thanks.
>>>>
>>>> Best,
>>>> Chao
>>>>
>>>>
>>>>
>>>
>

Mime
View raw message